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Preface 


Although in this edition Applied General Sialislics has been largely 
rewritten, the objective is the same as that of the first edition: to describe 
the more commonly used statistical methods and to illustrate tlj^eir ap- 
plications in many fields. 

The scope is essentially the same as that of the earlier edition, but the 
topic t>f statistical significance has been expanded, while the book as a 
whole has been shortened some 115 pages. Nearly all of the illustrative 
examples are new and, as before, represent real rather than hypothetical 
data. The order of topics has been altered, the fitting of curves to fre- 
quency distributions and the discussion of significance tests having been 
moved to the end of the book. A few of the symbols have been changed, 
for simplicity and clarity. Each chapter which uses symbols is preceded 
by a symbol vocabulary for that chapter. The forthcoming fourth 
edition of Workbook in Applied General Statistics will conform to this 
edition in regard to symbols used and the order of topics. 

This second edition of Applied General Statistics was prepared by 
Frederick E, Croxton, except that Dudley J. Cowden provided a first 
draft of three chapters. The third edition of Practical Business Staiisiics 
will be prepared by Dudley J. Cowden. 

I am indebted to Professor Sir Ronald A. Fisher, Cambridge, to Dr. 
Frank Yates, Rothamsted, and to Messrs. Oliver and Boyd Ltd,, Edin- 
burgh, for permission to reprint portions of Tables III and IV fron>«iJi£iji 
book Statistical 7'aMes for Biological^ Agricultural^ and Medical Eesearch. 
I am similarly indebted to Professor Egon S. Pearson and to the Bio- 
metrica Trustees for permission to reprint the tables or portions of tables 
from Biometrika and from E. S, Pearson and H. 0. Hartley's Biometrica 
Tables for Statisticians^ Volume 1, which are shown here in Appendices I, 
J, M, 0, and P, as well as cfiarts 25.6 and 25.7. Other persons, and 
organizations, who supplied data or gave permission to reprint material 
are acknowledged at the appropriate location. 

Many people helped with the work incidental to the revision of this 
textbook. Dr. James D. Paris preparcKl the first draft of the two chapters 
on index numbers, and Robert E. Lewis wrote the initial draft of the 
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PREFACE 


description of the method of adjusting a seasonal index for the date of 
Easter, i^rtions of the manuscript were read by Alfred J. Kana, 
Associate in Statistics, Columbia University. Charles H. Wittmann 
helped with some phases of the research incidental to the preparation of 
the manuscript. Julius I. Brown, Roy C. Calogeras (now Lecturer in 
statistics, Columbia University), and Donald W. Lovejoy performed 
computations. Keith Galli, Antony Herrey, Marcia Silfin, Mirian 
Weissman, and Faedon Xydis assisted in the drawing of charts, many 
of which were lettered by Marie Morisawa. Most of the manuscript was 
typed by Miss Hsi-lan Wang. ' To all of these who helped I express my 
thanks, but my particular thanks are extended to my wife, Rosetta 
R. Croxton, who was ready to do whatever was needed whether it was 
computing, drawing charts, lettering, or working with me on the index. 

Fkederick E. Croxton 


Leonia, 
New Jersey 
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CHAPTER 1 


Introduction 


STATISTICAL DATA AND STATISTICAL METHODS 

The term statistics is used in either of two senses. In common parlance 
it is generally employed synonymously with the word data. Thus some- 
one may say that he has seen “statistics of industrial accidents in the 
United States.” It would be conducive to greater precision of meaning 
if we were not to use statistics in this sense, but rather to say ^^data (or 
figures) of industrial accidents in the United States.” 

“Statistics” also refers to the statistical principles and methods which 
have been developed for handling numerical data and which form the 
subject matter of this text. Statistical methods, or statistics, range from 
the most elementary descriptive devices, which may be understood by 
anyone, to those extremely complicated mathematical procedures which 
are comprehended by only the most expert theoreticians. It is the pur- 
pose of this volume not to enter into the highly mathematical and theo- 
retical aspects of the subject but rather to treat of its more elementary 
and more frequently used phases. 

Statistics may be defined as the collection^ presentation^ analysis^ and 
interpretation of numerical data. The facts which are dealt with must be 
capable of numerical expression. We can make little use statistically of 
the information that dwellings are built of brick, stone, wood, and other 
materials; however, if we are able to determine how many or what propor- 
tion of, dwellings are constructed of each type of material, we have 
numerical data suitable for statistical analysis. 

Statistics should not be thought of as a subject correlative with physics, 
chemistry, economics, and sociology. Statistics is not a science; it is a 
scientific method. The methods and procedures which we are about to 
examine constitute a useful and often indispensable tool for the research 
worker. Without an adequate understanding of statistics, the investi- 
gator in the social sciences may frequently be like the blind man groping in 
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a dark closet for a black cat that isn't there. The methods of statistics 
are useftil in an ever-widening range of human activities^ in any field of 
thought which numerical data may be had. 

The ^derivation of the word ''statistics" suggests its origin. The 
administration of states required the collection and analysis of data of 
population and wealth for purposes of war and finance. Gradually data 
of more diverse nature were obtained for the general uses of government. 
Certain phases of statistics were developed by students of games of 
chance. Insurance and biology, as well as other natural sciences, were 
fertile" fields for the application and development of statistical methods. 
Today there is hardly a phase of endeavor which does not find statistical 
devices at least occasionally useful. Economics, sociology, anthropology, 
business, agriculture, psychology, and education— all lean heavily upon 
statistics. The medical research worker often must rely upon statistics 
to determine the significance of his results. The lawyer, especially 
if he is in corporation practice, may frequently find statistical devices 
of definite use. It should, of course, be added that the musician, the 
artist, the actor, and the writer of fiction would rarely have occasion 
to employ statistics, but even here the analysis of certain data of sales, 
box-office receipts, and trends of popular taste might prove useful. 

In defining statistics it was pointed out that the numerical data are col- 
lected, presented, analyzed, and interpreted. Let us briefly examine each 
of these four procedures. 

Collection. Statistical data may be obtained from existing published 
or unpublished sources, such as government agencies, trade associations, 
research bureaus, magazines, newspapers, individual rcscandi workers, 
and elsewhere. On the other hand, the investigator may collect liis own 
information, going perhaps from house to house or from firm to firm to 
obtain the data. The first-hand collection of statistical data is one of 
the most difficult and important tasks which a statistician must face. 
The soundness of his procedure determines in an overwhelming degree the 
usefulness of the data which he obtains. 

The following chapter treats of these two methods of obtaining data. 
It should be emphasized, however, that the investigator who has experi- 
ence and good common sense is at a distinct advantage if original data 
must be collected. There is much which may bo taught about this phase 
of statistics, but there is much more which can lie learned only through 
experience. Although a person may never collect statistical data for his 
own use and may always use pulfiishcd sources, it is essential that, he have 
a working knowledge of the processes of collection and that he be able to 
evaluate the reliability of the data he proposes to use. Untrustworthy 
data do not constitute a satisfactory base upon which to rest a conclusion. 
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It is to be regretted that many people have a tendency to accept sta- 
tistical data without question. To them, any statement which is pre- 
sented in numerical terms is correct and its authenticity is automatically 
established. Shortly after the retirement of a clerical employee oi a rail- 
road, it was announced in the press that during his 43 years of employment 
he had commuted a total of 1,200,000,000 miles. Most readers of the 
statement probably accepted it without question. ^ As a matter of fact, in 
order for the figure to be correct the employee would have had to travel 
approximately 3,200 miles each and every hour of every day during the 
entire 43 years! 

Presentation. Either for one’s own use or for the use of others, the 
data must be presented in some suitable form. Usually the figures are 
arranged in tables or represented by graphic devices as described in Chap- 
ters 3 to 6. 

Analysis. In the process of analysis, data must be classified* into 
useful and logical categories. The possible categories must be considered 
when plans are made for collecting the data, and the data must be classi- 
fied as they are tabulated and before they can be shown graphically. 
Thus the process of analysis is partially concurrent with collection and 
presentation. 

There are four important bases of classification of statistical data: (1) 
qualitative, (2) quantitative, (3) chronological, and (4) geographical, each 
of which will be examined in turn. 

Qualitative. When, for example, employees are classified as union or 
non-union, we have a qualitative differentiation. The distinction is one 
of kind rather than of amount. Individuals may be classified concerning 
marital status, as single, married, widowed, divorced, and separated. 
Farm operators may be classified as full owners, part owners, managers, 
and tenants. Natural rubber may be designated as plantation or wild, 
according to its source. 

Quantitative. When items vary in respect to some measurable charac- 
teristic, a quantitative classification is appropriate. Families may be 
classified according to the number of children. Manufacturing concerns 
may be classified according to the number of workers employed, and also 
according to the value of goods produced. Individuals may be classified 
according to the amount of income tax paid. 

Most quantitative distributions are frequency distributions. The data 
of Table 8.3 show a frequency distribution of grades received by the 1952 
graduating class of the United States Merchant Marine Academy. A 
number of other’frequency distributions are shown in Chapters 8, 9, and 
10 . 

Sometimes, qualitatively classified data may be reclassified on a quanti- 
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tative basis by making very slight changes. The assets of a bank ma^’' be 
listed in iiespect to degree of liquidity (cash, due from banks, United States 
securities, marketable securities, call loans, eligible paper, other loans, real 
estate 4oans, real estate, and furniture and fixtures). Although these 
categories differ from one another in a more or less unassignable qiianti» 
tative fashion, the classification is actually made upon a qualitative basis. 
If we should reclassify 4he bank assets according to the length of time 
required to convert each into cash, the classification would be quantita- 
tive. In general the assets would be in the same order as before, but a few 
specific items among the less liquid qualitative groups (for example, cer- 
tain real estate and real estate loans) would be convertible into cash in a 
relatively short time. 

Chronological - Chronological data or time series show figures concern- 
ing a particular phenomenon at various specified times. For example, the 
closing price of a certain stock may be shown for each day over a period 
of months or years; the birth rate in the United States may be listed for 
each of a number of years ; production of coal may be shown monthly for 
a span of years. The analysis of time series, involving a consideration of 
trend, cyclical, periodic (seasonal), and irregular movements, will be 
discussed in Chapters 11 to 16. 

In'a certain sense, time series are somewhat akin to quantitative distri- 
butions in that each succeeding year or month of a series is one year or one 
month further removed from some earlier point of reference. However, 
periods of time — or, rather, the events occurring within these periods— 
differ qualitatively from each other also. The essential arrangement of 
the figures in a time sequence is inherent in the nature of the data under 
consideration. 

Occasionally a time series may be converted into a frequency distribu- 
tion. If a railroad company has kept records of the number of railroad 
ties replaced each year, the data constitute a time series. When the same 
information is used in conjunction with the dates of installation, the life 
of the various ties may be expressed as a frequency distribution, showing 
perhaps: 

Length of hfe Number of ties 

4 but under 5 years 2 

5 but under 6 years. ...... 5 

6 but under 7 years, 17 

etc. etc. 

Geographical. The geographical distribution is essentially a type of 
qualitative distribution, but is generally considered as a dtistinct classifica- 
tion. When the population is shown for each of the states in the United 
States, we have data which are classified geographically. Although there 
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is a qualitative difference between any two states, the distinction that is 
being made is not so much one of kind as of location. Geographically 
classified data are shown in Tables 3.1 and 3.4 and in Charts b. 19-6.22. 

Sometimes a geographical distribution may be put into the foym of a 
frequency distribution. Thus, if we had data of the yield of corn per 
acre in each county of Iowa, we would have a geographical series. These 
data may be put into the form of a frequency distribution by stating the 
number of counties having yields per acre of ^40 and under 15 bushels/^ 
^^15 and under 20 bushels,'^ and so forth. 

The presentation of classified data in tabular and graphic form is but 
one elementary step in the analysis of statistical data. Many other proc- 
esses are described in the following pages of this book. Statisticaljnves- 
tigation frequently endeavors to ascertain what is tj^pical in a given 
situation. Hence all types of. occurrences must be considered, bojbh the 
usual and the unusual. 

In forming an opinion, most individuals are apt to be unduly influenced 
by unusual occurrences and to disregard the ordinary happenings. In 
any sort of investigation, statistical or othermse, the unusual cases must 
not exert undue influence. Many people are of the opinion that to break 
a mirror brings bad luck. Having broken a mirror, a person is apt to be 
on the lookout for the expected ‘^bad luck'’ and to attribute any untoward 
event to the breaking of the mirror. If nothing happens after the mirror 
has been broken, there is nothing to remember and this result (perhaps the 
usual result) is disregarded. If bad luck occurs, it is so unusual that it is 
remembered, and consequently the belief is reinforced. The scientific 
procedure would include all happenings following the breaking of the 
mirror, and would compare the “resulting'^ bad luck to the amount of bad 
luck occurring when a mirror has not been broken. 

Statistics, then, must include in its analysis all sorts of happenings. If 
we are studying the duration of cases of scarlet fever, we may study what 
is typical by determining the average length and possibly also the diver- 
gence below and above this average. When considering a time series 
showing steel-mill activity, we may give attention to the typical seasonal 
pattern of the series, to the growth factor (trend) present, and to the 
cyclical behavior. Sometimes it is found that two sets of statistical data 
tend to be associated. In Chapter 19 it is pointed out that there is an 
association between temperature and the rapidity with which crickets 
chirp. If the temperature increases, the crickets chirp faster; if the tem- 
perature decreases, the crickets chirp more slowly. The relationship can 
be expressed mathematically and we can estimate the rapidity of crickets^ 
chirps from the temperature; or, conversely we can make a good estimate 
of the temperajfure based upon the rapidity of chirps. 
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Occasjonally a statistical investigation may be exhaustive and include 
all possible occurrences. More frequently, however, it is necessary to 
study a smaller group or sample. If we desire to study the expenditures 
of lawyers for life insurance, it would hardly be possible to include all 
lawyers in the United States. Resort must be had to a sample; and it is 
essential that the sample be as nearly representative as possible of the 
entire group, so that we^may be able to make a reasonable inference as to 
the results to be expected for an entire population. The problem of 
selecting a sample is discussed in the following chapter. In Chapters 24, 
25, an'd 26 an attempt is made to determine how much reliance may l)e 
placed in the results obtained from samples. 

Sometimes the statistician is faced with the task of forecaasting. He 
may be required^ to prognosticate the sales of automobile tires a year 
hence, or to forecast the population some years in advance. Several 
years ago a student appeared in a sumrher session class of one' of tlie 
writers and in a private talk announced that he had come to the course for 
a single purpose: to get a formula which would enable him to forecast the 
price of cotton. It was important to him and to his employers to have 
some advance information on cotton prices, since the concern purchased 
enormous quantities of cotton. Regrettably, the young man had to be 
disillurfoned. To our knowledge, there are no magic formulae for fore- 
casting. This does not mean that forecasting is impossible; rather it 
means that forecasting is a complicated process of which a formula is hut 
a small part. And forecasting is uncertain and dangerous. To attempt 
to say what will happen in the future requires a thorough grasp of the 
subject to be forecast, up-to-the-minute knowledge of devclopmex^is in 
allied fields, and recognition of the limitations of any me(4iani(‘.a! fore- 
casting device. Further comments concerning forecasting are to he found 
in Chapter 22. 

Inlerpretation. The final step in an investigation consists of inter- 
preting the data which have been obtained. What are the conclusions 
growing out of the analysis? What do the figures tell us that is new or 
that reinforces or casts doubt upon previous hypotheses? The results 
must be interpreted in the light of the limitations of the original material. 
Too exact conclusions must not be drawn from data which themselves are 
but approximations. It is essential, however, that the investigator dis- 
cover and clarify all the useful and applicable meaning which is presemt in 
his data. 


A FEW IMPROPRIETIES 

The research worker must be constantly on the alert to avoid any mis- 
uses of his material. Illogical and careless reasoning or improper use of 



Chap. I) 


INTllODUCTiON 


' 7 

data will destroy the value of a study which may be technically acceptable 
in its earlier phases. A few examples of fallacious procedures may clarify 
this point. In later chapters of the book, other fallacies are occasionally 
mentioned in connection with the methods to which they apply. ^ 

Bias. The presence of bias on the part of an investigator is, obviously, 
sufficient to discredit the entire undertaking. Bias may be conscious or 
deliberate; in such a case it is synonymous with falsification. On the 
other hand, an unconscious bias may be operative, and this, perhaps, is a 
more dangerous form, since the analyst liimsolf may not !)e aware of it. 
The following is an illustration of apparently unconscious biasd 

A friend laid invited an ac<|uaintan(*G to lunch, and found at the end 
of the meal that he had left his purse in the office and had no money. 

The acquaintance, at his request for a loan, took out fivc-dollar nil! 
and a ten-dollar bill. My friend took one of them — to this day he does 
not know wliich — telling his acquaintance not to let him forget the loan. 

He did forget it, however, until several weeks later when they met again, 
and each wrote on a piece of paper the sum he thought had been bor- 
rowed. The lender wrote ten, and the borrower five. They were both 
psychologists, so each searched his memory carefull}^, and each had cir- 
cumstiintial cviden(;c that seemed to each conclusive, to prove himself 
right. Neither cared about or needed the money especially, but to them 
it indicated a universal principle, that each of us interprets and remem- 
bers facts in the form most agreeable to himself. No wonder both sides 
must be I’eprescnted in courts of law, and that much honestly given 
evidence must bo rejected! 

As will be seen in the following chapter, statistical data cannot be 
picked out of thin air as the conjurer appears to produce coins at his finger 
tips. The process is one requiring care and attention to details. The 
data, when obtained, should be of value and not be casually disregarded. 
Note Avhat a reviewer said of a certain author: 

Blank is thorough awl undaunted. Have statistics on any subject 
been collected before? He has colle(‘tod more and better ones. If it is 
by its intrinsic nature unchartable, he has charted it none the less. . . . 
Chronology itself fares ba<lly in his hands at times. If his examples 
require to be a <‘cntury or two misp}a(*e(l, Blank can forget even his sta- 
tistics and his charts in the good cause of logic. 

Omissioii of important factor. Shortly after the introduction of 
the ail-metal top for automobiles, a certain manufacturing company felt 
called upon to prove that all-metal tops did not result in hotter car 
interiors. They suggested a test involving three steps: 

1 From Mind of a Child/' by Jeaskm Cosgrove, Omd Housekeeping, January 

1927, p. 206. 
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1. "Take a piece of top. fabric about 8 inches square. Place a piece of 
Ikung material of similar siae beneath the fabric, and a thermometer 
beneath the lining material. 

2. Take a piece of highly finished steel about 8 inches square. Place 
rimilar sized pieces of -f-inch felt and lining material beneath the 
metal, and a thermometer beneath the lining material. 

3. Place each of the above assemblies on a board at room temperature. 
Carry the entire apparatus out into hot sunshine, leave it exposed 
for about 10 minutes, and then read the temperature of the two 
thermometers. 

The difficulty with the above experiment is that the reader is asked, in 
step 2, to use a piece of highly finished steel Automobile tops are painted 
— some of them with black or a dark color of paint — and therefore absorb 
more heat than does highly finished steel The obvious fallacy in the test 
vitiates the experiment, although the additional insulation may actually 
make the metal-top car cooler than the fabric-top car. 

Carelessness. We cannot go through life without making mistakes, 
but carelessness should be reduced to a minimum. The wife of one of the 
authors wrote to a large department store to ask the size of a cedarized 
storage chest. The reply said, ^^This merchandise is available in the 
3'' X 1" X li" size.” 

Many of us have received sealed envelopes minus enclosures, or postal 
cards blank on the message side, and have, perchance, been guilty of send- 
ing the grocer^s bill back to the grocer minus the check or with the check 
unsigned. 

A study of salaries was under way and a certain corporation had been 
requested to furnish data concerning its employees. A note to its report 
appeared substantially as follows: ^^All salaries under $5,000 per annum 
are shown as the maximum for each type of work. The assistant to the 
auditor stated that the maximum is equivalent to a general average for 
each group,” Perhaps this is an illustration of a conscious bias on the 
part of the assistant to the auditor. It must be obvious that, if the maxi- 
mum and the average are the same, then there are no values below the 
maximum, 

A chain store advertised chuck roast at 49 cents per pound. In one of 
its stores there were nine chuck roasts, all wrapped in transparent mate- 
rial and labelled as to price per pound (49^), weight, and price for the 
piece. Three of the roasts were marked as follows: 3 lb, 9f oz,, $2.92; 
4 lb. 15f oz,, $4.05; 4 lb. 12i oz., $3.86. Division of these prices by their 
weights will show that the charge was at the rate of 81 cents per pound, a 
price much higher than that current at the time for chuck roast. Several 
months later similar mispricings were observed in the same store for legs 
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of lambj so possibly this iiliistration should be listed under a heading 
other than carelessness/^ 

Non-sequitiir. A weekly news magazine, the circulation Sf which 
had been growing in a healthy fashion, undertook to demonstrate, for a 
particular year that its readers greatly exceeded its circulation. After 
showing figures of its circulation, the magazine stated : And each of these 
subscribers represents 3.26 cover-to-cover readersi^ according to former 

Deputy Police Commissioner , who counted and identified [sic] 

216,948 fingerprints on copies his operatives had picked up at random 
from subscribers^ homes in seven different cities or towns/ ^ How cfoiild 
the investigator know the fingerprints belonged to cover-to-cover readers? 
Or, did he find each fingerprint on every page and, if so, does that prove 
each page was read? Do you ever actually read a magazine from dbver 
to cover? 

Non^-comparable data. In July 1936 , newspapers carried repofts of 
a meeting of the American College of Osteopathic Obstetricians at which a 
doctor is reported, by a metropolitan paper, to have stated that the mater- 
nal death rate among mothers treated by osteopathic physicians was less 
than half that among cases handled by the medical profession. The 
higher rate in the latter instance was said to be due to excessive use of 
anaesthetics, interruption of labor, and undue reliance on mechanical 
devices. A survey of 14,000 osteopathic delivery cases was said to show 
a maternal death rate of 2.8 per thousand cases. This figure wm com- 
pared with the nation's average of more than 6 per thousand. It should 
be obvious that the average rate for the entire country is not representa- 
tive of the rate for cases attended by the medical profession, since many 
maternity cases are not attended by physicians. 

The makers of a small, inexpensive car had been stressing the fact that 
the introduction of their car had converted many used-car buyers into 
new-car owners. Concerning costs of operation, they pointed out that 
“owners report up to thirty-five miles to the gallon of gasoline, which 
compared with the average mileage obtained with a used car . . . is a 
saving of great importance to persons in the low-income group/' The 
comparison of maximum mileage for one type of car with mileage 

for other types of used cars is certainly unjustified. 

Confusion of association and causation. Sometimes factors which 
are associated are erroneously regarded as being causally related. A 
southern meteorologist discovered that the fall price of corn is inversely 
related to the severity of hay fever cases. This does not imply that the 
low price of corn causes hay fever to be severe, nor does it imply that 
severe cases of hay fever bring about a drop in the price of corn. The 
price of corn is g^^nerally low when the cVn crop has been large. When 
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the weather conditions have been favorable for a bumper com crop, tliey 
have*'also been favorable for a bumper crop of ragiveed. Thus the faii 
price oi corn and the suffering of hay fever patients may each be tracanl 
(at least partly) to the weather, but are not directly dependent upon ea<*h 
other. A further discussion of association and causation is given in 
Chapter 19. 

Another instance^ of the confusion of association with causation 
occurred in a statement by a research organization which, having studied 
rannual data, said, ‘^When farm income goes up, factory payrolls invari- 
ably follow, but they do not lead the procession. One is cause, the other 
effect.^^ If such a procession does exist, it can hardly be shown by annual 
data. If factory payrolls follow farm income, we should sho\v tliat fact 
by plotting monthly data as is done for other series in Chart 22.9 and 
Chart 22. 10. As to the causal relationship, it is fairly obvious that, Avhile 
an increase (or decrease) in farm income does have a corresponding effe(‘t 
upon factory payrolls, the payrolls in turn have a reciprocal effect upon 
farm income. Furthermore, both are dependent upon any other factors 
which tend to affect the pattern of general business. 

IiisufTicient data. Insufficient data result in a high degree of uncer- 
tainty respecting any conclusion may be made from them. A very 

smcTll sample may lead us to a correct conclusion, but we cannot place a 
high degree of assurance in our conclusion. When a medical worker is 
developing a new treatment, he does not announce its efficacy after trying 
it out on a few individuals. He must have enough data so tfiat he can be 
relatively sure of results. If two or three subjects respond favorably, he, 
cannot be safe in claiming that the occurrences were not due to (ffiance. 
The favorable ix^sponses of these few might have (!ome without the treat- 
ment, or in spite of it! Of course, there must be a ^^contror* group to 
show how the subjcc.ts would respond without any treatment, or with the 
usual treatment. Moreover, both the control group and the treatcMl 
group must be sufficiently large to warrant a conclusion. A discussion 
of the reliability of values (tompiitcd from samples is given in Chapters 
24-26. 

Unrepresentative data. Conclusions may be based tipon data whicdi 
are numerically suffiiuent, but \vhich are not representative. A sinall 
sample may be representative; on the other hand, a large sample may not 
be representative* 

An example of a conclusion based upon unrepresentative data is the 
forecast of the 1936 presidential election as made by the Literary Digest, 
More than 10,000,000 straw ballots were sent out by the Digest Of 
these, 2,376,523 were returned and they Indicated that 370 electoral votes 
would be oast for Landon an9 161 for Roosevelt, Tlie fiiuil election 
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results were 523 electoral votes for Roosevelt and 8 for Laiidou.^ . The 
difficulty was that the mailing lists used as a basis for the poll were rela- 
tively heavil}?- weighted with persons in the upper economic brackets and 
thus were not representative of the entire voting population. 

Concealed classification. Conclusions drawn from statistical data 
may sometimes be invalid because of the presence of a concealed classi- 
fication which is overlooked. The fallacy of concealed classification is 
illustrated by some data which appeared in the Monthly Labor Review and 
concerning which its readers were warned. Data were presented showing 
the union wage rates in Hebrew and in non-Hebrew bakeries^ It 
appeared from the figures that Hebrew bakeries paid an average hourly 
rate about 50 per cent higher than non-Hebrew bakeries. Qualifying this, 
the Review said, Although Hebrew bakeries generally have higher rates, 
one reason for this large difference is the fact that a large proportion of 
the Hebrew bakeries are located in New York City, where the average 
of all rates is higher than in other localities.’^ 

A concealed classification was found to be present in a study of suicides. 
The data seemed to show that suicides were more likely to occur among 
certain religious groups than among others. Upon further consideration 
it was apparent that the matter of the urban or rural occurrence of the 
suicides had been overlooked. Hence the conclusion should have been — 
not that suicides tended to tie up with given religious groups — but that 
suicides were more common in urban territories and that these religious 
groups were also more numerous in the cities. 

Failure to define units. In a pamphlet given to each motorist with| 
his renewal of an automobile vehicle or driver’s license, a state automo- 
bile commissioner called attention to the fact that 26 years earlier the 
‘^mileage death rate” had been 23.6 while in the year just ended there had 
been a mileage death rate ” of 4.2. There was no explanation of whether 
this was the number of deaths per mile — or per thousand miles — of high- 
way in the state, or the number of deaths per hundred, per thousand, or 
per million miles of vehicle travel during the year. Certainly it was not 
deaths per mile of vehicle travel, although at a quick reading that was 
what it seemed to be. Inquiry revealed that the ratio was the number of 
highway fatalities per hundred million miles of vehicle travel. The 
mileage was obtained by multiplying the number of gallons of gasoline 
sold in the state during the year by 13.12, the average mileage per gallon 
of gasoline. Incidentally, one may well wonder about the accuracy of 
this average and how it was obtained. Gasoline sales were, of course, 
available from state tax records. 

Misleading totals. Those of us who read the sport pages of the 
newspaper' are lively to have noticed a statement each autumn to the 
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effect that a certain number of thousand~or million— fans ]iad watched 
the home team play during the baseball season just ended. For example, 
it was Stated that 1,538,007 fans attended the home games of the New 
Yorls Yankees during the 1953 baseball season. This figure was arrived 
at by adding the number of persons attending each home game. It does 
not, as is too often carelessly said or intimated, represent 1,538,007 fans, 
but rather the specifipd number of admi.ssions, many individuals having 
attended more than one game. 

A somewhat similar meaningless, but impressive-sounding, total was 
present in a statement made by a horticultural concern that had recently 
acquired another similar company, which itself represented a recent 
merger of two other concerns. The statement was to the effect that their 
conffbined horticultural experience now totalled 295 years. This figure 
was obtained by adding the ages of the three companies. 

Poorly designed experiment. For an experiment to be« valid, it 
must be so designed^ that the results which are arrived at cannot be 
attributed to factors other than those wFich are under consideration. 
The illustration which follows will be mentioned again, in another con- 
nection, at the end of Chapter 25. At the time that fluoresceni, lighting 
was first introduced, some people believed that persons who were exposed 
to the radiation of the lights would become sterile. A railroad had 
already installed fluorescent lights and, hoping to counteract this belief, 
undertook an experiment in which one group of rats was subjected to 
incandescent light, while another group was subjected to fluorescent light. 
After a period of time the first group had the usual number of offspring, 
while the second group had none! A skeptical executive asked that the 
second group of rats be re-examined with care, and it w'as discovered that 
all of the rats of that group were of the same sex. It is elementary that 
the two groups should have had the same sex composition. 

RESEARCH METHODS 

It must not be assumed that the statistical method is the only method 
to use in research; neither should this method be considered the best 
attack for every problem. Just as, the carpenter has a number of tools, 
each appropriate for a different sort of operation, so the researcher can 
avail himself of various techniques which are the tools of his trade and 
each of which is appropriate to a specific type of situation. If an amateur 
carpenter uses a screwdriver in lieu of a chisel, the results are not likely to 
be either workmanlike or satisfactory. Similarly, it is important that the 


» A brief discussion of experimental design will be found in 0. L. Lacey, Statiatical 
Methods in Exp&imenMion, The MaSanillan Co., New York, IQgd, Chapter 2. 
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investigator consider Ms problem carefully at the outset and make use of 
the technique or techniques which are appropriate to it. Just as the 
carpenter needs to use more than one tool in completing a piece of worky 
so the research worker must often make use of, not one, but several 
methods.® 

When we desire a great deal of information concerning each individual 
or occurrence to be studied, much of our data may be non -quantitative 
by its veiy nature. In such an event we employ the case study method of 
investigation, the purpose of which is to consider in detail the character- 
istics peculiar to the individual case and to generalise from a number of 
such detailed studies. Some of the information obtained in a study of 
case histories (such as wages, number of offspring, and so forth) may be 
statistical, and when many cases are included, statistical summaries ipay 
be made of the non-quantitative information obtained. 

If interest centers in changes in behavior or attitudes, the panel tech- 
nique may be used. This consists of interviewing the same group of 
people on two or more occasions. The panel procedure may obtain data 
of a quantitative nature when information concerning, for example, con- 
sumption habits and family budgets is obtained; as for case studies, 
statistical analyses may be made of non-quantitative information, such 
as opinions on public questions, if the panels are large enough. 

Sometimes a problem may be attacked by the historical approach. 
Although the historical method is largely descriptive and non-quantitative, 
we may find statistical aspects when we consider growth or decline of 
imports, exports, population, and other series. 

Again, the appropriate procedure may be to make use of the experi- 
mental meihodf in which we allow only the factor we are studying to 
vary, and attempt to control as many as possible of the other factors. 
For example, if we wished to study the effect of car weight upon tire 
mileage, we should control road conditions, speed, temperature, size of 
tire, quality of rubber and of cord, inflation of tire, and many other 
factors. 

In the social sciences, the experimental method can rarely be applied 
and certain aspects of the statistical method are used in lieu of it. We 
cannot, for example, ascertain the effect of different sorts of diets upon 
length of life, by forcing groups of people to live upon prescribed diets and 
by actually making all other phases of their lives identical Instead, we 
must find groups of people on different diets, and then we must measure 

® Various methods are described in: Marie Jahoda and others, Mesearch Methods in 
Social Uelatiom, The Drydeu Press, ]S<ew York, 1951; Mildred B. Partea, Burmys, 
'FolUf and BampleSf Harper and Brothers, New York, 1950; and Manuel C. Elmer, 
Social Mesearchf Prentice-Hall, Ine,, New York, 1939. 
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the ifnportance of, and control statistically, as set forth in Chapter 21, as" 
many as possible of the other phases of their lives, since we cannot control 
them experimentally. The experimental and statistical methods are not 
antithetical, but under practical conditions the statistical method supple- 
ments the experimental method. If an experiment could be so designed 
that all variables were completely controlled, statistics might not be 
needed. At best we can usually control but a few of the more important 
factors, and thus it is'^necessary to evaluate statistically the importance 
of a host of other minor disturbing factors (sometimes designated as 
*^chance’’)> described in Chapters 24-26. 

Some problems may be approached by the deductive method rather than 
by the inductive method. When a hypothesis has been set up deductively 
and^when quantitative data are available, statistics may enable an 
inductive test t6 be made of the hypothesis, and this test may serve to 
support or to discredit the hypothesis. Conversely, relationships arrived 
at statistically (as, for example, the rather close negative association 
found in some states concerning the size of farms and the value of land 
per acre) may suggest causal connections which may be worked out 
deductively. Again we have two methods which are not antagonistic, 
but complementary. 
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When a research worker undertakes the study of a topic, he may^ be 
able to choose between collecting the data himself or obtaining the needed 
figures from already available published or unpublished compilations. 
If an individual or organization has prepared reliable data which are 
pertinent to the problem, it is vastly less expensive to make use of the 
existing information. Although to collect one’s own data is more costly, 
that procedure may enable the investigator to obtain exactly the infor- 
mation which is needed to answer the specific questions that are under 
consideration. 

Not all readers will be faced with the problem of collecting original 
statistical data; many will find it possible to refer to existing sources for 
information. However, the data from such sources may be evaluated 
and more intelligent use may be made of them if the research worker has 
some knowledge of the procedure and pitfalls involved in collecting, edit- 
ing, and marshalling statistical data. 

An illustration cited by Stamps is to the point: Harold Cox, when a 
young man in India, quoted some Indian statistics to a judge. The judge 
replied, “Cox, when you are a bit older, you will not quote Indian sta- 
tistics with that assurance. The government are very keen on amassing 
statistics~they collect them, add them, raise them to the nth power, take 
the cube root and prepare wonderful diagrams. But what you must 
never forgot is that every one of those figures comes in the first instance 
from the chowty dar (village watchman), who just puts down what he 
damn pleases.” It should be added that this story refers to the India of 
a day long past. Today India has many able statisticians and an active 
statistical society. Presumably the chowty dar no longer functions as the 
source of local statistical information. 


^ Sir Josiah Stamp, Some Eeonomic Factors in Modern Life, P. S. King and Son^ 
London, 1929, pp. 258-259. 
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T!ie process of collecting statistical data will be examined first. Latef 
in this chapter, attention will be directed toward the use of statistical 
sources^. 

COLLECTING STATISTICAL DATA 

Method of collection. Statistical data are frequently obtained by a 
process in which the desired information is obtained from the house- 
holder, business man, or other informant, either by having an enumerator 
visit the informant, ask the necessary questions, and enter the replies on a 
schedule, or by mailing to the informant a list of questions (sometimes 
called a questionnaire) which he may answer at his convenience. The 
data collected at each population census are obtained by the enumeration 
process, the enumerators undertaking to visit every place of abode in the 
United States. Sometimes information is obtained by registration, 
which means that the information is reported to the proper authority 
when, or shortly after, an event occurs. Thus births and deaths must be 
registered. In many states automobile accidents must be reported to 
the commissioner of motor vehicles. 

In general outline the problems of obtaining data by mailing question- 
naires, by enumeration, and by registration are similar. Under a system 
of registration there is, of course, the difficulty that many persons will 
neglect to register. Constant vigilance and frequent checkups are neces- 
sary on the part of the registrar. Registration, however, is usually with 
a properly designated government official, and there is ordinarily legal 
compulsion that the data be supplied. Since most statistical information 
is obtained by enumeration or by mailing questionnaires, the balance of 
this section will be devoted to the procedure for collecting data by these 
methods. 

Outline of procedure. The steps in a statistical investigation, 
which involves the collection of data, may be designated as follows: 

1. Planning the study. 

2. Devising the questions and making the schedule. 

3. Selecting the type of sample, if the enumeration is not to be a com- 
plete one. 

4. Using the schedules to obtain the information. 

5. Editing the schedules. 

6. Organiring the data. 

7. Making finished tables and charts, 

, > 8. Analysing the findings. 

The steps will usually be taken in the order shown, except that the 
decision concerning the type of sample to be used may be included as part 
of the first step. We shall discuss each of the eight st%s in turn. 
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1* Planmng the study. If a topic is to be studied statistically, it 
behooves the investigator to become familiar, at the outset, wi|h what 
has already been done by others. He may find that someone else has 
already examined the same topic and that his questions have already 
been answered. He may wish to design his study so that it can be com- 
pared with those which have preceded his. He will doubtless profit by 
the experience and the mistakes of others. He m^y find that the diffi- 
culties involved in the investigation of his topic are so great that they are 
insurmountable; the cost may be too great, or it may appear that inform-*^ 
ants do not wish to divulge the type of information which is needed. 

Having studied what others have done, the investigator is ready to 
consider the general aspects of what he would like to know. If an 
employment and unemployment study is projected, there are many 
inquiries concerning each individual which are pertinent. The following 
suggests ^ome of the more important ones: 

Does the individual have any dependents? How many? 

Is the person male or female? 

What is his or her marital status? 

How old is the person? 

Is he native white, native colored, or foreign born? If foreign bom, 
from what country? 

Does he own property? 

What is his usual occupation? In what industry? 

What type of work is he doing at present? (If the study is a detailed 
one, consideration may be given to listing the job experience of the 
individual for a number of years, together with the wages received.) 

Is he employed full time? Part time? Is he entirely unemployed? 

If the individual is working part time or is totally unemployed, 
what is the reason? 

If he is totally unemployed, how long has he been so? Also, is he 
able to work and willing to work; or, alternately, is he actively looking 
for work? 

The reader will doubtless think of other questions of importance, but 
these suffice to indicate the nature of this preliminary step. Usually we 
cannot undertake to obtain answers to all the questions which are impor- 
tant. It may be too expensive to make so comprehensive an inquiry. 
There may be some questions (such as the one concerning property 
ownership or a query in regard to wages) which informants will often 
decline to answer. The most important and practicable questions are 
therefore selected to form the basis of the inquiry. It is these w^hich mil 
be incorporated into the schedule. 

There are several matters of general importance which are often con- 
sidered in connection with laying out the general plan. One of these has 
to do with the esliensiveness of the study. Will it include the entire 
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community or merely a sample ? If funds and enumerators are available, 
we m"ay make a complete enumeration; often we must be satisfied with a 
sample.’ We shall discuss the selection of the sample after we have com- 
pleted consideration of the schedule. 

Another problem concerns whether the schedule is to be sent out by 
mail (in which case it must be very simple and self-explanatory) or 
whether enumerator&are to be used. If use is to be made of paid enumer- 
ators, it is necessary to locate qualified persons. However, it is often 
rwue that funds are not available to hire enumerators. In fact, it is some- 
times the case that, valuable as the results of an investigation might be, 
they are not worth what it would cost to employ enumerators! Studies 
nave been made using, as unpaid enumerators, policemen, college 
students, postmen, truant officers, and even school children. 

A third matter has to do with the place where the informants will be 
interviewed. For an employment-unemployment study we could send 
enumerators to interview people at their work, in the streets, or at home. 
It is obvious that the last of the three is preferable. For the unemploy- 
ment study we should also consider whether or not to enumerate all the 
people in a household, irrespective of age, sex, desire for work, and mental 
or physical condition. To list everyone would give a complete picture, 
but it also involves much work. When making an employment study, 
we may not be interested in housewives who seek no work outside the 
home. We may be interested in elderly men, in an attempt to learn 
what proportion of the population is retired or is considered too old or 
infirm to work. Since young children are not ordinarily part of the labor 
force, it may be desirable to exclude all persons below (say) 14 or 16 years 
of age. For the purpose of the following illustration, we shall consider 
that all persons over 14 years of age were enumerated. 

2. Devising the questions and making the schedule. It hfes 
already been pointed out that not all the questions which we would like 
to have answered can be included in the schedule. Having selected those 
topics which we wish to include in our inquiry, we must formulate each 
question so that it may be readily and accurately answered, and then we 
must draft the schedule form. The schedule form shown on page 19 is 
one that might be used in a community study of employment and unem- 
ployment. This schedule would, of course, be supplemented by a sheet 
or booklet of instructions to the enumerators. The instnictions would 
explain what is meant by “household” and by “family,” since both terms 
are used, whether age was to be “to nearest birthday" (the so-called 
“insurance method") or “to last birthday” (the so-called “census 
method”), what categories are to be used for “race,” the meaning of the 
terms “occupation” and “industry,” and so on. 
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Marne . . Area .. . . Household, 


Address . Card . ... Enumerator 


1. Relation to head of household 2. Age . .. 3. Sex .... , 4. Race. 


5. Regular employment 
Occupation 
Industry 


7. Circle one number to indicate what this person was primarily doing during the week ending March 
20, 1954: 

01 Working for compensation in money or “kind.” 

02 Self-employed. 

Has a job or is self-employed, but not at work because . 

03 On vacation. 

04 Bad weather* 

05 Labor dispute 

06 Layoff of‘30 days or less. 

07 Own sickness. 

08 Other 

09 Not at work, new job to begin within 30 days. 

10 Not at wojjk, looking for work. 

11 Casual worker, no regular job. 

12 Attending school. 

13 In the armed forces. 

14 Keeping house (not as employee). 

15 Unpaid worker on family farm or in family business. 

16 Volunteer worker, not on family farm or in family business. 

17 Retired. 

18 Physically or mentally unable to work. 

19 Inmate of institution. 

20 Other 


8. If this person worked at all last week, for compensation, or on family farm or in family business, or 
as a self-employed person, how many hours did he or she work? hours. 


9. If this person was looking for work, how many weeks has he or she been seeking employment? 
weeks. 


Remarks 


tJrlmatown Employ raeRt-Onem^oymeat Study, 1054 
41 

OrWutown Ei^nployment-Uncittpl^ymeiit Selieduile* 


6. Present employment: 

Occupation 

Industry 
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VETERANS ADMINISTRATION 

WASHINGTON 25, D. C. 


(Veteran* s name, 
address, and 
policy fnumber 
appeared here.) 


-Thi* it «» app*#! to yo« to coop«f«te tn « sci«ntif»c 
ttudy Wtch wili olBOte certtinly yield teaulU of great im- 
porttftee to ttedicin* ood public heelth. 

The rapid inereete in the ute of tobeeco »n recent 
yetn has eeueed ouch ditcuttion in oedicel circlet concern* 
inf the pottible effect* of tobacco on health. The evidence 
pretentiy available in regard to the tubieet does not clearly 
estattliah whether or not the u«c of tobacco is a aenoua has* 
ard eacept for pertona »iJfh. certain diteatet. It it necet* 
tary to father the data rron a large nuaber of pertoni in 
order to obtain a dependable answer. 


Contequently the Veterans Administration it cooperat* 
ing with the United States Public Health Service in a study 
of this question by distributing the enclosed questionnaire. 

Only a few minutes of your tisw will be required to 
complete it and an envalop which requires no pottage it en* 
closed for your convenience in returning your questionnaire. 
I know you will feel a tense of personal satisfaction in 
helping the government make this valuable research study. 

*ith amny thanks for your cooperation. I am 


yours 


Administrator of Veterans’ 


PLEASE answer each OF THE FOLLOW JNG QUESTIONS WHICH APPLIES TO YOU 
TOUR BEST ESTIMATE. 3*- havi 


IF YOU 00 NOT REMEMBER EXACTLY. ENTER 

YOU "cvtA 'useo reaaccolH anv FOMst 


STM fPay. Month, ysar> 


ffSUAL occur AT itti fplea#* aatear 

WHAT KtkO or weaK have too ooee ouauie most or Yooa 
tirei (for exanp/a carpenter, punchpreae aparatar, 
«afaa tlrrk, prapriatorj 


SHAT KINO or lUSINCSS OK INOUSTAY WAS YOUW CHPLOYEA 
tNCAfiCO INT (tor axawpia. heuaiffg eonatraietJeo. auta 
trotorf, rtutio rrtrii, Airdwara atora^ 


YOU AAE STIU 

YtASS ALTOGETHEB HAVC YOU USCO ITt 
C. If YOU HAVE STOPPCa O^NC TOIACCO. 

MS ALTOCETMCa r 

p, ooaiwg Yooe osTiae Lire have you gvcwi 

SMOKCO At LEAST A$ MANY AS % TO lO 
PACKS OF CI0APETTC5T 
SWOICEO AT LEAST AS MANY AS 90 TO TS 
Cl SAKS? _________ 



If your answer to AHY of thi flwa quest Iona l« Itam JO above U ■TES.” pleat# enswer the fol lowing dueatlona about that 
form of tobacco. If your answer to all of tbo five qiiastions In item 50 above ia •HO," ploaae return the queationnaire 
without answer ino the foUirntna quest ions. 


. AT IHf PSESENT TWC. HOW tSANV 
CICAKCTTCS 00 YOU SMOKE ON THF 
AVtHkCtI 


CIGARETTES 


NOW MANY YCASS 
HAVE YOU SNOKEO 
AT THIS SATE? 


, SUOKCO CICAKGTTCS ONCC IN AWHH 
* BUT NOT tVeWY PAY 
. aEQUI.AM.Y SMOKE CIOAftCTTES lUT 
^ LESS THAN tO A DAY 


^ KCOULAair WOKE 40 OW MOKE 


IP fES. CHECK the NSUrlNUr NUMBEW OF CfCANCTTES EvCW 
WEOUI.AWLY SMQKEP ANQ HUMtCt VEAWS fMOKEO AT THAT KSTt 
AVeSAOE NUMaCn OF CI6METTES 

- SMOKCO CIOAWETTES once IN 

awhile IUT not evesy day ,«». 

W£OUtAW.Y”iMOKEO LESS Ww' ' ” 

* 10 ocakcttcs a pay 

, aCGULARLY SMOKED FICM tO 
^ TO »0 CICAWETTCS A DAY 
q SECULAALY SMOKED MOSE THAN tO MIT 
LESS THAN 40 CIOAaETTES A DAY 
. WEOULAauV SMOKED 4* OK — 

' MOae CIOAKETTES A DAY 


One Side of a Questiemnaire Used by tine Veterans Administration and tbe 
United States Public Health Service for a Study of tbe Use of Tobacco. Tbe 
reverse side asked concemmg cigars, pipe smoking, and use of ckewing tobacco and 
snuS, 
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A portion of another schedule, which was sent by mail to insured 
veterans of World War I, is shown on page 20. The purpose of this 
schedule was to obtain data concerning the use of tobacco, which is to be 
studied in relation to cause of death as these veterans die. The schedule 
shown here contains only the section dealing with the use of cigarettes. 
Similar sections for cigars, pipe smoking, and use of chewing tobacco and 
snuff were on the reverse of the schedule. 

A third, and very simple, schedule is shown just below. This was a 
postcard to be returned to the Country Gentleman magazine. This form 
is of interest, not only because of its simplicity, but also because the Qirtis 
Publishing Company sent a shiny new dime^’ as a token of apprecia- 
tion^^ to those cooperating. The company states that a postcard 
questionnaire, such as the one shown, will bring in a return of about 


L How is your mail delivered? R.F.D. or Star route 

At Post Office Door-to-door delivery. 

2. What is the occupation of the 

head of your household? 

3. What is his (or her) kind of business? 

4. Do you live on a farm or ranch? Yes No 

5. If you do not live on a farm or ranch, does anyone in your household 

a. Own or rent farm land? Yes No 

b. Operate or work on a farm? Yes No 

6. If you are not a farmer, what is your interest in Country Gentleman f 


Fost«card Questionnaire Used by tbe Curtis Publisbing Company. 

20 per cent when no coin is sent. When a dime was sent, a return of 
65 per cent was obtained. It was also found that by using a quarter 
instead of a dime, the return could be brought up to about 70 per cent. 

The construction of statistical schedules is something which is learned 
most satisfactorily by actually making and using them. Nevertheless, 
there are some cautions which are helpful : 

(a) Clarity is esseniiaL The entire schedule, as well as each question, 
should be as simple and as clear as possible. This is particularly true of 
schedules sent to, or left with, persons to be filled out at their convenience. 
An ambiguous question or a question that invites an ambiguous answer 
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produces useless data and involves waste of time and money. One 
organis'ation, in making a study, queried some hundreds of parents: “Is 
your child’s outlook on life broader or narrower than yours was at the 
same age?” The investigator presumably expected the replies to read 
“Broader” or “Narrower.” Replies actually received, however, were 
frequently “Yes,” “No,” “I doubt it,” and “I hope so”— none of which 
had any meaning. Furthermore, the question is so worded as not to 
allow for the fact that there may be two or more children in the family. 

The inquiry concerning marital condition when put “Married or 
Single?” is open to two objections: (1) Either a “Yes” or a “Nq” answer 
is meaningless; (2) not all persons are included in these two categories. 
One good way of asking this question is to say: 

Check whether: 

Single 

Married 

Widowed 

Divorced- 

Separated. 

To clarify the meaning of “single,” the term “never married” is some- 
times used. 

The investigator should not be satisfied merely with wording his ques- 
tions so that they can be understood; he should draft them so carefully 
that they cannot be misunderstood. 

(b) Not all qimtions can he OMuraiely answered. No matter how clearly 
a question is stated, there are some sorts of queries which are apt to elicit 
unsatisfactory returns. The schedule used in 1950 for the Census of 
Population and Housing of the United States asked for the age at last 
birthday for each person enumerated. Reference to the published 
results in 1950 Census of Population, Vol. II, Part 1, Table 94, shows some 
peculiar irregularities in the distribution of the population by single years 
of age. Beginning with age 25 and continuing through age 70, there are 
definite concentrations of persons on every age ending in 0 or 5, except^ 
for age 55. For example, there are more people who were reported to be 
25 than either 24 or 26 years of age. There are also secondary concen- 
trations upon some ages which are multiples of 2, most noticeable when 
these even years of age are not adjacent to a multiple of five. Thus, 
there are concentrations at 28, 32, 38, 42, and so on, through 62. Fur- 
thermore, there seem to be too many males reported as 21 and too many 
non-white females reported to be 18 years old. The Enumerators Refer- 

* Non-white and foreign-born white showed concentrations on 55. Native white 

sliowed a coaceatratioji on 54. 
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%nce Manual (p. 34) notes that some ages will be reported in reund 
numbers and warns the enumerators as follows: “Estimate of Age— li a 
respondent gives an offhand estimate such as ‘around 60,’ try to find out 
whether the person is nearer 58 or 59 or possibly 61 or 62. Try to* get it as 
accurate as possible. If age is not known, enter the estimate as the last 
resort, and footnote it as an estimate. An entry of ‘21 plus’ is not 
acceptable.” 

The rounding of ages is not peculiar to the United States Census; it may 
be expected to occur in any inquiry where age is not obtained from birth 
certificates or some other accurate record of date of birth. Some of the 
factors believed to lead to reporting ages in round numbers are: (l)'Tbe 
information concerning an individual is not necessarily furnished to the 
enumerator by the person himself; it is often given by a relative, friend, 
rooming-house keeper, or other person, and some of these'informants can- 
not have exact information. (2) When ages are intentionally misstated, 
as they occasionally are, there is reason for believing that they are often 
rounded. (3) Some persons are careless, or occasionally a person of low 
intelligence may always think in terms of round numbers. Rounding is 
most noticeable for those classes of the population in which the proportion 
of illiterates is greatest. (4) A few persons do not know their exact ages. 
(5) There may be carelessness on the part of enumerators. Some 
improvement in the accuracy of reporting ages may be had by asking*date 
of birth instead of, or in addition to, age. It should be recognized, how- 
ever, that the posing of a more exact question does not produce better 
data when exact knowledge is lacking, as in the case of a landlady report- 
ing for her roomers. Furthermore, the matter of the expense involved in 
asking this additional question might more than offset the expected 
increase in accuracy. When age is of primary importance, as in the case 
of application for insurance, date of birth is usually asked and may be 
verified by documentary evidence. 

Another interesting example of thinking in terms of round numbers 
occurred in the case of a contest sponsored by a motion picture theater. 
An irregular-shaped glass jar was filled with cranberries, and six prizes 
were offered to the patrons who guessed most nearly the correct number 
of cranberries in the jar. An analysis of the 1,996 guesses showed that 
there were 1,465 which ended in 0 or 5. 

(c) Certain types of questions should he avoided. When the prosecuting 
attorney asked the alleged wife beater, “Have you stopped beating your 
wife?” he attempted to put the defendant, whether he replied “Yes” or 
“No,” in the position of admitting that he had beaten his wife. In a 
scientific investigation we should scrupulously avoid such leading ques- 
tions. When asking the reason for unemployment in an unemployment 
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surv.ey, made during a depression, an enumerator would be suggesting-^ 
the answer if he said, “I suppose you are unemployed because of the 
depression’” Rather, he should inquire, “What is the reason you are 
unemployed? ” 

Questions which are unduly inquisitive or which are liable to offend 
should likewise be avoided. In a study of social workers, each married 
woman was asked whether or not she lived with her husband. The 
inquiry was injudicious, aroused reseniment, and would hardly have been 
productive of useful data if it had been answered by all the persons 
queried. Questions concerning personal matters (such as income) should 
be handled with tact — perhaps asked at the close of the interview after 
the cooperation of the informant has been obtained. Sometimes it is 
better not to ask such a question but to infer the general income level 
from knowing it there is a telephone in the home; if the home is owned, 
and its apparent value; the individual’s occupation; make of car(s) 
driven, if any; servants employed, if any; and so forth. The 1950 Census 
of Population asked the amount of income for a twenty per cent sample 
of the population and, although this question— like all Census queries — 
was authorized by law, a special confidential form requiring no postage 
was provided for those who preferred to send this information directly 
to the Bureau of the Census. In one survey informants were asked: How 
much cash do you customarily carry on your person? How much cash 
do you ordinarily keep around the house? Many refusals to answer may 
be expected for such questions. 

(d) Answers should he objective and capable of tabulation. When factual 
studies are being made, questions should be so designed that objective 
answers will be forthcoming. Instead of asking the condition of a build- 
ing and allowing the enumerator to state the condition in his own words, 
a study made by the United States Department of Commerce asked if a 
structure was in good condition, needed minor repairs, needed structural 
repairs, or was unfit for use. Although the answers to these questions 
are not completely objective, at least they are capable of being readily 
tabulated. 

(e) Instructions and definitions should be concise. The enumerator and 
informant should never be in doubt as to what information is desired and 
what terms or units are to be used. When inquiring as to the employ- 
ment status of an individual, the inquiry must refer to some specific time. 
Thus, the 1950 Census of Population asked information as of the week 
preceding the visit of the enumerator. 

If information is desired as to the exact situation of a part-time worker, 
it must be made clear whether the d^ired answer should be; (1) hours 
per day; (2) hours (or days) per week; or (3) fraction of usual full time. 
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The units used in a study should be clearly understood by both the 
enumerator and the informant. If we are collecting data from farmers 
and orchardists on apple production, we should specify whether ^we want 
data in terms of bushels or boxes of fruit. If we desire information as to 
the number of rooms in houses, it should be noted whether or not 1)ath- 
rooms, kitchenettes, powder rooms, dressing rooms, and the like are to be 
counted as rooms. 

(f) Arrangement of questions should be carefully planned. Not only 
must the questions be well arranged on the schedule form to allow proper 
space for answers, but the order of the questions should be such as 
to facilitate the answering of each question in turn. If a logical flow 
of thought is involved, it should be followed in the arrangement of ques- 
tions. Questions should not skip back and forth from one topic to 
another. 

After ^ schedule has been drafted, the desirable procedure is to try it 
out with a group, discover its shortcomings, and then revise it in the light 
of the tryout. If there is not time for a tryout, ask some competent 
investigators to go over it and make suggestions for its improvement. 
When the final form of the schedule has been decided upon, careful 
instructions for filling it out should be prepared. If the schedules are to 
be mailed to the persons furnishing information, these directions slv)uld 
be as clear and consise as possible. If enumerators are used, the instruc- 
tions to the enumerators should be complete in order to cover as many as 
possible of the situations which may occur in their work. 

3* Selecting the type of sample. The United States Census of 
Population is a complete enumeration of the inhabitants of the United 
States. That is to say, it is as complete as it is possible to make it, A 
very few people, such as tramps, fugitives from justice, and dwellers in 
extremely remote places, may not be included, but the intent is to include 
everyone, and no one is knowingly omitted. Similarly, the Census of 
Agriculture undertakes to include all farms in the United States as well 
as certain specialized operations® including greenhouses, nurseries, 
poultry yards, and apiaries. 

Sometimes a partial enumeration is used instead of a complete enumer- 
ation. Occasionally, only the larger units may be included. For 
example, the biennial Census of Manufactures for 1921”-1939 included 
only those establishments with annual products valued at $5,000 or more. 
The enumerations were incomplete in regard to number of establishments 
included, but included a high proportion of the total number of wage 

® See U. S. Bureau of the Census, United Stodes Cemus of Agrimlturef 1B50, Vol. 
II, p. xxix. 
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earners in manufacturing and of the total value of manufactured products." 
Follomng 1939, no Census of Manufactures was taken until 1947, when 
all establishments employing one or more persons were included. In 1949 
an Annual Survey of Manufactures was instituted; the annual survey 
uses a sample, employing a combination of the procedures described in 
the following paragraphs. 

It may be too expensive or too time-consuming to attempt either a 
complete or a nearly complete coverage in a statistical study. Further- 
more, to arrive at valid conclusions, it may not be necessary to enumerate 
aH or nearly all of a population. We may study a sample drawn from 
the larger population and, if that sample is adequately representative 
of the population, we should be able to arrive at valid conclusions. There 
are various ways in which a sample may be selected from a population. 
No matter which of these is employed, it must be remembered that the 
cardinal purpose is to obtain a representative sample, that is, one which 
contains all elements in the same proportion as in the population from 
which it is drawn. In short, it is not merely a matter of grabbing any 2, 
5, 10, or 20 per cent sample of a population, but of selecting that sample 
in such a way that it will be as representative as possible. 

(a) Random sample. If a sample is drawn in such a way that each 
time an item is selected, each item in the population (or universe) has 
an equal chance of being drawn, the sample is said to be a random one. 
Under these conditions, each combination of a specified number of items 
will have the same probability of being selected. This is sometimes 
referred to as unrestricted or simple random sampling to differentiate it 
from sampling procedures which combine random sampling with other 
requirements, for example, the initial division of a non-homogeneous 
population into appropriate homogeneous sub-groups. 

When populations are homogeneous, in regard to the characteristic in 
which we are interested, random samples may be expected to produce 
satisfactory results. If, for example, a large receptacle contains a popu- 
lation of thousands of marbles, i of which are white, i black, and i red, 
and if those marbles are identical in size, shape, density, and all other 
characteristics except color, we have a homogeneous population. If the 
marbles can be thoroughly mixed, between each draw of a marble, by 
rotating the receptacle, or otherwise, randomness is not too difficult to 
achieve. Under the conditions indicated, it is more likely that a sample 
of marbles will show the three colors in the same proportion that they 
exist in the population than that these colors will be present in some other 
proportion. This does not mean that every sample will show the propor- 
tion in the population ; but if many samples are drawn they will tend to 
do so. Furthermore, wide disagreements will rarely occur. 
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In the illustration just given, randomness was not difficult to attain. 
Suppose that a population consists of equal proportions of four skes of 
bolts and that all were made from the same material. In such a situation, 
mixing the bolts in a container will not help us to obtain a random sample 
of the various sizes, since smaller objects tend to gravitate to the bottom. 
Satisfactory mixing might possibly be obtained on a horizontal surface, 
but here one would have to be careful not to select the larger bolts because 
they are more prominent. A somewhat similar problem is met in 
sampling shipments of grain and of coal. For grain, the lack of homo- 
geneity is recognized and samples are sometimes taken by plunging a iube 
vertically into the grain in several locations. This procedure is similar to 
stratified sampling described in section (d). 

Sometimes items cannot be physically mixed, yet a random sampje is 
desired. Mixing may be impossible because the items' are bulky, 
immovaye, or fragile, or because they may be households or individual 
persons. Again, mixing may be possible but may not assure randomiza- 
tion, since the individual selecting items from the mixed population may 
not pick the items at random. Randomization is sometimes achieved by 
assigning numbers to the items in the population and drawing the sample 
or samples by reference to a table of random numbers.^ This may be 
referred to as “mechanical randomization,^’ the term being also applied 
to the use of coins or dice. 

When samples are taken from each batch of screws, ‘nails, bolts, brick, 
wire, or other products of a factory, physical mixing may not be necessary 
since the items may be selected from time to time from the production^ 
stream. Such a method of selection is not exactly random and may, in 
fact, contain a bias if the machine, die, drill, jig, or other device used in 
producing the items tends to wear or get out of adjustment during the 
production of a batch. Selecting items from a production stream is 
somewhat akin to the method next described. 

(b) Systematic sample. When a sample is obtained by drawing every, 
say, tenth item on a list or in a file, the sample is a systematic one. The 
first item should be selected at random. Such a sample is sometimes 
drawn from an alphabetical list of names or from cards filed in alpha- 
betical, numerical, or other order. Certain population information 
called for on the schedule used for the 1950 Census of Population and 
Housing was obtained for but 20 per cent of the persons listed. To 
obtain this sample, every fifth line on a schedule was labeled “Sample 


^ For example, the table given in R. A. Fisher and F. Yates, Statnticul Tables for 
Biological, Agricultural and Medical Eesearch, Hafner Publishing Company, Inc., 
New York, 1949, pp. 104-109, 
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line : . . ask ques. below/’ Five forms of the schedule were printed, 
each with a different arrangement of sample lines. 

It is i:pportant that the basic list, from which a systematic sample is 
chosen, is actually the population which one desires to study. The 
failure of the LitevciTy Digest to forecast correctly the 1936 presidential 
election was due to the fact that its apparently systematic sample of more 
than 2,300,000 ballots was not selected from an appropriate basic list. 
The voters were selected from lists of automobile owners and telephone 
subscribers, which, even more so in 1936 than would be true today, 
fdled to include enough of those persons in the lower income groups. 
A similarly incomplete list was used as the basis from which to draw a 
sample for an unemployment study in a New England city during the 
depression of the 1930’s. The sample was selected from the subscribers 
for electricity, gas, and water. The list did not include the poorest 
families. 

No general statement can be made to the effect that more reliable or 
less reliable results may be had from a systematic sample than from a 
random sample of the same size. The conditions under which systematic 
selection is to be preferred to random sampling, or vice versa, are too 
involved to be discussed here, ^ 'but one caution should be mentioned. 
The sampling intervals (every 5th item, every 10th item, on a list) must 
not coincide with any constantly recurring characteristics in the listing 
of the items. 

(c) Cluster sample. Before proceeding to describe a cluster sample, it 
will be useful to introduce the term sampling unit The sampling unit 
is the basic entity in any sample and may be a marble, a bolt, an indi- 
vidual, a manufacturing concern, a farm, a household, a geographic area, 
and so forth. In the case of the marbles, the units were simple and 
differed from each other only in regard to color. Other units may be 
complex and may differ from each other in many respects. For example, 
manufacturing concerns differ in regard to nature of product, capital 
invested, number of employees, and in many other ways. When our 
units are people, we find that they differ in respect to sex, age, race, 
occupation, emplo 3 nnent status, economic status, religion, and so forth. 
About all that they may have in common is that they are human beings 
and live in the same community. Such differences are important and 
need to be kept in mind when a sample is selected. The more unlike 
the sampling units, the more difficult is the problem of selecting a repre- 
sentative sample. 


M. H. Hansen, W. N. Hnrmtz, and W, G. Ma4ow, Sample Sntvey Methods 
and Theorg, I, John Wiley and So:^, New York, 1953, pp. 503-512, 
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The cluster sample is sometimes referred to as an area sample because 
it is frequently applied on a geographical basis. Essentially it consists 
of a random selection of groups of units. For example, on a geographical 
basis, we might select blocks of a city or counties of continental United 
States. As a non-geographical illustration, the bolts of four sizes, 
previously mentioned, might be spread out on a horizontal surface marked 
into squares of equal size and a random sample of the squares taken. 
The blocks, counties, or squares constitute the clustm’s,® and within each 
group all of the units present may be included. Multistage sampUm 
involves samples of the units from the groups, or samples of sub-groups 
from the groups (for example, townships from the counties in the cluster), 
or both. Multi-stage sampling may also include other types of samples 
in one or more of the steps. 

(d) Stratified sample. When a population is known to be hetero- 
geneous, '^and when that heterogeneity has a bearing on the characteristic 
being studied, the population may be divided into strata and random 
samples of units drawn from each stratum. The purchaser of a box of 
berries recognizes the existence of heterogeneity, and thus of strata, when 
she turns out the contents to examine the bottom as well as the top layers. 
Frequently, the number of units selected from each stratum is propor- 
tional to the number of units in that stratum in the population. ® An 
interesting application of the stratified sample was made in the study of 
the effects of strategic bombing on Japanese morale*^ made by the United 
States Strategic Bombing Survey. One important provision in the selec- 
tion of this sample was that interviewers could make no substitutions for 
persons designated on the sampling lists. Substitutions for persons not 
at home, or otherwise not readily available, is a dangerous source of error 
in any type of sample. 

Note that stratified sampling cannot be used unless some information 
concerning the population and its strata is available. An extremely 
important point, which is often overlooked, is that the strata must be 
ones which are related to the topic being studied. If we are making a 
health study of male students in a college, we might recognize such strata 
as those who do or do not live at home; those who are totally, partially, 
or not at all self-supporting; those who do or do not take regular exercise; 
those who do or do not smoke; and so forth. However, there are other 
strata which clearly have no bearing on the problem. To take an extreme 


« The clusters are sometimes called ^‘primary sampling units” and the items in the 
clusters termed ‘^elementary sampling units.” 

^ See Morale Division, the United States Strategic Bombing Survey, The EfiecU 
of Strategic Bombing on Japanese Morale, [Washington], 1947; Appendix L Oni oj 
print. 
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illusjbration, wc might recognize such strata as those who habitually wear 
caps or hats, those who prefer single- or double-breasted coats, or any 
other categories which are not related to health. Another important 
consideration is that stratified sampling is most advantageous when the 
strata differ from each other as much as the population will allow, but 
there should be homogeneity within each stratum. 

Many public opinion and market research organizations make use of 
the principle of stratified sampling. Sometimes enumerators may be told 
4,0 work within a given city block (a geographical stratum) and talk with 
a gfven number of people selected at random. The selection, too often, 
is not a random one, consisting as it does of those who are at home, those 
willing to be interviewed, and those who, by their appearance, look as if 
th^ would beAvilling to talk. 

For a non-homogeneous population, a properly stratified sample may 
be expected to yield more reliable® results than a random sample of the 
same size. From this it follows that the same reliability may be had from 
a smaller stratified sample. There is some danger that investigators, 
having an excessive feeling of security in the stratified sample, may use 
samples that are too small to give statistically reliable results. This can 
be guarded against by an intelligent use of the method and of the reli- 
ability formulas.® Although both proper stratification and size of sample 
are important, a large sample cannot compensate for poor stratification. 
Of course, a stratified sample taken from a homogeneous population is no 
more reliable than a random sample of the same size. 

(e) Sequential sampling. Sequential sampling has been used most 
widely in connection with quality-control schemes having to do with raw 
material or a manufactured product, but it is gradually coming to have 
other applications.^® It involves testing a relatively small number of 
items which may lead to a decision to accept or reject the lot from which 


® In this text we shall consider (in Chapters 24, 25, and 26) the error formulas for 
random samples only. An understanding of the behavior of random samples is a 
necessary groundwork for evaluating samples obtained by more complex procedures. 
Error formulas for other types of samples may be found in H. M. Walker and J. Lev, 
Statisiical Inference^ Henry Holt and Company, New York, 1953, pp. 173-177; in 
M. H. Hansen, W. N, Hurwitz, and W. G. Madow, Sample Survey Methods and Theory, 
VoL I, John Wiley and Sons, Inc,, New York, 1953; and in W, G. Cochran, Sampling 
Techniques, John Wiley and Sons, Ino., New York, 1953. 

® See footnote 8. 

Applications in commercial research are described and the process of sequential 
sampling explained in Robert Ferber, Siaiistical Techniques in Market Research, 
McGraw-Hill Book Company, Inc., New York, 1949, Chapter VIL A more com- 
plete explanation of sequential analysis is given by the originator, Abraham Wald 
in his book Sequential Analysis, John Wiley and Sons, Inc., New York, 1947, 



Chap, 2] 


STATISTICAL DATA 


31 


the sample came. If the first sample leads to no clear decision^ it is 
enlarged (possibly one item at a time) until a decision can be made. 

(f) Other types of samples. The five types of samples p^'eviously 
described are sometimes referred to as ^^probability samples/’ since it 
is possible to ascertain the probability that an individual item is included 
in the sample, Other sampling schemes, differing from those already 
described, also exist. They are not considered desirable procedures since 
they involve subjective factors, or their reliability cannot be ascertained 
satisfactorily, or both. Among these are: (1) the purposive sample, jn 
which one sets out to make a sample agree with the population in regard 
to certain characteristics — for example, average income and size of family; 
(2) the quota sample in which interviewers, working in a certain area, 
are instructed to talk with individuals having particular characteristics 
(If interviewers are told to talk with 10 native-white males, 4 Negro 
males, and 3 foreign-born males, it is more than likely that the foreign 
born who are interviewed will be those who are able to speak English 
well enough to be conversed with satisfactorily. This would introduce 
a bias into most studies, since the population actuall}’’ studied would not 
be the population which was intended to be studied. (3) the random 
point sample, which consists of locating many points at random on a map 
and enumerating a predetermined number of sampling units nearest to 
each point (This procedure is occasionally used for sampling farms, but 
through its use large farms are more likely to be included than are small 
farms.). 

When deciding which sampling plan to use, the investigator must con- 
sider the efficiency of the scheme. It has already been noted that a 
stratified sample yields more reliable results (that is, its sampling error 
is smaller) than does a random sample of the same size. Cluster sampling 
may be expected to yield less reliable resuUs than random sampling for 
samples of the same size. The efficiency of a sample scheme refers to 
the reliability in relation to unit cost. Thus, a geographic cluster sample 
with groups of units in, say, 20 locations in a large state may have a lower 

See the references given in footnote 8. 

A good discussion of quota sampling may be found in F. Mosteller and others, 
The Pre-^BlecUon Polls of 1948^ Social Science Research Council, New York, 1949, 
pp. 83-91 and 94-96. The danger of using a quota sample is well illustrated on 
page 96. 

The distinction between sampled population and target population (and other 
principles of sampling) is treated in ‘‘Principles of Sampling/' by W. G. Cochran, 
F. Hosteller, and M. W. Tokey, Journal of the American Statistical Association^ 
March, 1964, pp. 13-36. 
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cost per sampling unit than a random sample of the same size with the 
units scattered here and there about the state. The difference in unit 

cost may be so great that the cluster sample may be made enough larger 
than the random sample so that the cluster sample will yield more reliable 
results than could be had from a random sample for the same expenditure. 

A sample may be selected by use of a combination of the methods 
previously discussed. Here is the procedure followed by the American 
Institute of Public Opinion^^ in 1953: 


The regular sample for the national surveys of the American Institute 
of Public Opinion is a sample of the adult population. Provision is 
made for selecting from the regular sample a sample of an approximation 
of the voting population when such is desired. The design provides 
stratification by seven regions (groups of states), and within each region 
stratification by geographical distribution, three rural-urban strata, the 
census economic areas, and the size of the locality finally selected. A 
systematic sample of localities was drawn within each stratum frdm a 
random start with probability of selection proportional to size. Within 
large urban communities sampling units^^ (small clusters of blocks) 
were drawn at random with probability proportional to size. In smaller 
communities and rural regions sampling areas were drawn with equal 
probability. 

Interviewers are assigned selected areas, and required to work within 
the boundaries of such areas. Each national survey uses about 150 
sampling points, with equal numbers of interviews assigned to each 
point. A staff of over 1,000 interviewers is maintained. 


Sometimes a sample is taken in a more or less haphazard fashion. Or, 
the investigator may include the data which are convenient or readily 
available, after which he will trustingly announce that the sample so 
taken is doubtless representative of the population which he is studying. 
For example, one researcher, who had ascertained that just under 
2,500,000 children, eligible to be enrolled in high school, were not enrolled, 
desired to estimate how many of these 2,500,000 left school because of 
economic pressure. He managed to locate 16 acceptable studies concern- 
ing the reasons why students left school. These studies each included 53 
to 274 children, a total of 2,525. The studies were made in schools in 13 
different states. Negroes were studied in one instance. There were no 
figures from New York, Massachusetts, Illinois, Michigan, Wisconsin, 
Texas, and certain other populous states. Yet, because the geographical 
distribution was diverse and because large-city, small-city, and rural 
children were included, the investigator concluded: “The sample seems 
sufficiently representative of the various elements of the population to 


correspondence from Dr. George H. Gallup, Director of the American Insti- 
tute of Public Opinion. 

These are apparently *^primary sampling units.*^ See footnote 6. 
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^rve as the basis for estimation of the whole group/^ This may or may 
not have been true. The sample was neither random, stratified, sys^tem- 
atic, nor cluster; it merely included what was available. 

As will be shown in Chapters 24, 25, and 26, for random samples, the 
larger the sample, the more confidence we can place in conclusions drawn 
from the sample. It will also be shown that the greater the diversity 
there is in the population, the less reliability we can repose in samples of 
the same size. Mere size, of course, does not assure representativeness in 
a sample. A small random or stratified sample is apt to be much superior 
to a larger but badly selected sample. Sometimes a test of stability is*" 
made to determine when a sample is large enough. For example, a 
sample of 1,000 may be selected from a group of voters, and 57.3 per cent 
of the sample may indicate that they intend to vote for a certain candi- 
date. Another 1,000 may be chosen, and the two groups combined may- 
show 56.9 per cent. Adding another 1,000 may change the percentage to 
56.8, and’ still another 1,000 (4,000 in all) may leave the proportion 
unchanged, at 56.8. From this test, 3,000 or 4,000 would seem to be an 
adequate sample from the standpoint of size. However, the test of 
stability tests only stability and not representativeness. The fact that a 
percentage persists essentially unchanged means merely that we are con- 
tinuing to get about the same result as before. Conceivably, the first 
sample of 1,000 could have been decidedly unrepresentative (say, from 
only the poorer sections of the voting population), and each succeeding 
sample similarly unrepresentative. 

Mention has already been made of the possibility of bias being present 
in a sample. When a sample is being selected, it is important that bias 
be avoided. Bias does not mean the personal bias of the investigator 
which leads him deliberately to select his sample in order to show the 
results he desires. That is intellectual dishonesty. Neither does it 
mean that the persons answering the questions on the schedule are 
biased. The avoidance of bias involves, first, that there shall be no 
selective factor present in the drawing of the sample, and, second, that 
there shall be no selective factor present when schedules are returned 
from those persons included in the sample. In the case of the Literary 
Digest 1936 straw vote, a selective factor was present because the basic 
lists from which the sample was selected did not include the lower eco- 
nomic levels of the population. Sometimes the basic list may be com- 
plete, but the method of selecting the sample may introduce bias. Thus, 
a selection from an alphabetical list of names may be unsatisfactory 
because of nationality differences in the alphabetical distribution of 
family names. Such a bias may arise if sections of the list are chosen; it 
is not likely if (say) every tenth name is taken. 



u 


STATISTICAL DATA 


[Cmp. 2 


The second type of selective factor is frequently encountered if the 
maiM-questionnaire method of collection is used. When schedules are 
sent ou* by mail, an investigator never expects that all of them will be 
returned. Since only part of the inquiries are answered, how can he be 
sure that those who did answer are representative of all those to whom 
schedules were sent? Often he cannot be sure; sometimes it is obvious 
that they are not representative. An alumni association sent out 363 
inquiries to graduates, asking each to report (anonymously) his income 
for the preceding year. Replies were received from 133. It is quite 
fikely that a selective factor was present in these returns. Alumni who 
were out of work or who had very low incomes probably did not reply. 
This assumption is borne out by the data, which showed an almost com- 
plete absence of incomes below $1,500, although the study was made in a 
depression year. Conclusions based upon biased samples are, obviously, 
not only useless but misleading. 

4. Using the schedules to obtain the information. When agents 
or enumerators take the schedules to the persons who are to furnish the 
information, the enumerators may explain the purpose of the investiga- 
tion and solicit cooperation. Each question can be clearly explained as 
it is asked. Obviously, enumerators must be carefully instructed before 
they begin their work. Occasionally they are required to study the 
schedule and printed instructions, and then to take an examination. 
Enumerators should be persons of unquestioned integrity and should also 
be patient, polite, and tactful. Many a person resents being bothered 
to supply statistical (or other) information; some persons are reluctant; 
some refuse. The enumerator should plan his interviews to consume as 
little time as possible, and should bend every effort to get the desired 
information if it is feasible to do so. In some instances the work of the 
enumerator may be facilitated if a letter of explanation precedes the visit. 
Sometimes enumerators conduct interviews and fill in the schedules after- 
ward. This is done on the theory that people feel more free to talk if the 
remarks are not being written down at the time. It is believed, however, 
that this is an undesirable procedure, especially when there are a number 
of facts to be remembered and later recorded. Enumerators should 
carry credentials in order that the persons visited may be satisfied as to 
the official connection of the visitor. Even though an enumerator 
makes his request for information as tactfully as possible, he may some- 
times meet with a refusal. Frequently another visitor with a different 
approach may have better luck. It is sometimes a good plan to have 
one especially qualified worker who will follow up the more difficult cases. 

Occasionally an enumerator may encounter a person who is too willing 
to cooperate and who wants to talk at great length about the study. In 
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such a situation good terminal facilities are an asset. Carl Crow states^® 
that Chinese, when asked certain types of questions, are apt to give 
answers which they think will please the questioner. If an fenglish 
investigating commission asks young Chinese where they want to go to 
school, they are likely to reply, ‘‘England.” The same author tells^^ 
of an investigation made in Amoy, where, because of a lack of proper 
death registration, the number of persons dying was estimated from 
figures of the number of coffins made. The figures of coffin production 
mounted, showing the development of an epidemic; but, after the 
epidemic was definitely known to have declined, the figures of co:^iis 
made remained high. Upon close inquiry it developed that the coffin 
manufacturers had continued to report peak production of coffins so that 
the agent of the health officials would not lose his job. -^They did not 
want to “break his rice bowl.” 

Sending- schedules by mail rather than using enumerators is, at the 
outset, a less expensive method of collecting data. There is also the 
added advantage that the person supplying the information can fill out 
the form at his convenience, instead of being disturbed by the enumerator 
perhaps at a busy or inconvenient time. Furthermore on a mail question- 
naire (provided, of course, that the informant is sure his identity is 
unknown), confidential information may be given which the informant 
would hesitate to divulge to an enumerator. On the other hand, a large 
proportion of persons fail to reply to a mail inquiry and considerable 
follow-up work may be necessary. There is also great danger that the 
informant will not understand the questions, or will knowingly or other- 
wise make incorrect answers. Not only must clear, concise directions be 
sent with the schedule, but also a brief letter explaining the purpose of the 
inquiry and requesting cooperation. A modest gift (such as the coin 
sent by the Curtis Publishing Company) may insure a high proportion 
of returns. In any event, an addressed and stamped (or business reply) 
envelope should be included. An air mail business reply envelope (or 
card) is occasionally used by investigators with the hope that it will 
result in more and quicker responses. When follow-up work is necessary, 
the persons who have not yet returned their forms may be sent courteous 
personal letters reminding them of the inquiry and again requesting 
cooperation. When appropriate, the follow-up may be by means of air 
mail letters, special delivery letters, registered letters (to be sure the com- 
munication has been delivered), telegrams, or telephone calls. Of course, 
the investigator should not make a nuisance of himself ; he should not be 

Carl Crow, Four Hundred Million Cmtomer^j Harper and Brothers, New York, 
1937, pp. 132-133. 

Ibid., pp. 252-253. 
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too insistent. When only part of the schedules are finally received, it is 
necessary to examine the situation carefully to be sure that no selective 
factor Has been present. Or, if a selective factor appears to be present, 
it may be necessary to conduct a supplementary investigation to remedy 
the situation. 

5* Editing the schedules. After the filied-out schedules are 
received, a certain amount of preparatory work is necessary before the 
data are in shape to be tabulated. The editorial tasks are varied. In 
^he case of a small study, one editor may do the entire work. In a larger 
study, different phases of the editing may be portioned out among a 
number of editors. 

(a) Computing, It is usually belter not to ask enumerators or persons 
supplying information to make any computations. Thus, if information 
has been obtained concerning the number of rooms in a home and the 
number of members in the household, the editor may compute the ratio 
of persons per room, to give some idea of crowding. If data have been 
collected concerning the time lost through non-compensated accidents 
and also of daily wages for each of a number of workers, the editor may 
compute for each case the income lost because of accidents. 

(b) Coding, Tabulation is frequently facilitated by coding. When 
ma?chine tabulation (to be discussed shortly) is used, all entries on a 
schedule are reduced to a numerical code. Even when tabulation is 
manual, it may still be easier to look for a code mark — letters, numbers, 
or a combination of letters and numbers — instead of attempting to read 
the original entry. The work of the tabulator may be further facilitated 
by the fact that the editor writes, or should write, legibly and uses a 
distinctive color, often red. 

The unemployment schedule is shown edited according to a numerical 
code on page 38. Every entry is numerically coded, except those already 
expressed as numbers, in order to facilitate tabulation by mechanical 
means. Note that question 7 was self-coded. A simple code scheme for 
questions 5 and 6 might appear as follows: 

10, Professional 

20, Clerical (not otherwise specified) 

30t Domestic and personal service 

40» Government employees (other than teacdiers) 

Trade and Transportation 

50# Eetail and wholesale trade 

51, Telephone and telegraph 

52, Eailway, express, gas, electric light 

55* Water transportation 

54, Bank and' brokerage 
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55. Insurance and real estate 

56. Other 

Manufacturing and Mechanical Pursuits 

60. Building trades, contractors 

61. Building trades, wage earners 

62. Clay, glass, and stone products 

63. Food and kindred products 

64. Iron, steel, and their products 

65. Metal products, other than iron and steel 

66. Paper, printing, and publishing 

67. Wearing apparel and textiles 

68. Automobiles, parts, and tires 

69. Lumber and furniture 

70. Airplanes 

71. Other manufacturing and mechanical pursuits 

75. Labor (not otherwise specified) 

80. Self-employed (other than 10 or 60) 

90. Miscellaneous employments not classified above 

00. Not reported 

(c) Deciphering, The handwriting of an enumerator or of an inform- 
ant may occasionally be difficult to read. This is especially true when 
an enumerator makes entries on a schedule while he is outdoors in. the 
rain or snow. Deciphering such copy^ is the editor’s task ; he not only 
saves time for the tabulator, but also insures accurate results. If entries 
are literally unreadable, the schedule may have to be referred back to the 
enumerator or the person who sent in the information. 

(d) Checking, The editor may look over the schedules for incon- 
sistencies. Entries of age and date of birth may disagree. Something 
is probably awry if an individual reported as aged 8 is also shown to be 
married. Similarly, a mistake has probably (though not necessarily) 
been made if a woman is reported working full time as a blacksmith. 
Such entries must be verified if they are to be used. 

(e) Examinmg for completeness. The editor must also scrutinize the 
schedule to see if any entries are missing or incomplete. If the missing 
information is important, the schedule must be referred back to the 
enumerator or to the informant. Otherwise, the editor writes “N.R.’’ 
(not reported) or the corresponding numerical code in place of the missing 
information. 

6. Organizing the data. After the sch^ules have been edited, the 
data must be organized before finished tables and charts can be made. 
They are three methods that may be used: 

(1) The score or tally sheet. For purposes of illustration, let us con- 
sider a score sheet to show, by industry, for male heads of households the 
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Nane 






/OO 

A(|ciress 




Area fiouseliold. 

Card Emimerator 

2. Age ., 3. Sex 4. Race^V!^ 


5. Regular employment : 
y£7) Occupation 
^ Industry 


6. Present employment: 

^ Occupation . 

^ Industry . 


Circle one number to indicate what this person was priinaniy doing during the week ending March 
20, 1954- 

Working for compensation in money or “kind.'* 

02 Self-employed 

Has a job or is self-employed, but not at work because.. 

03 On vacation. 

04 Bad weather. 

05 Labor dispute. 

06 Layoff of 30 days or less. 

07 Own sickness. 

08 Other. 

09 Not at work, new job to begin within 30 days. 

10 Not at work, looking for work. 

11 Casual worker, no regular job. 

12 Attending school. 

13 In the armed forces. 

14 Keeping house (not as employee). 

15 Unpaid worker on family farm or in family business. 

16 Volunteer worker, not on family farm or in family business. 

17 Retired. 

18 Physically or mentally unable to work. 

19 Inmate of institution. 

20 Other 


8. If this person worked at all last week, for compensation, or on family farm or in family business, or 
as a self-empbyed person, how many hours did he or she work? . . hours. 


9, If this person was looking for work, how many weeks has he or she been seeking employment? 
.weeks. 


Remarks 


Urhaatown En>p%n)ent>Unftm{>loyi«i«nt Study, 1954 

Edited ‘‘Clrbantown Employ men t-Uitemploy-ixieitt Schedule. 
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number of hours worked during the week ending March 20, 1954. The 
score sheet is shown on page 40 and represents the data from all of the 
edited cards for male heads of households from one area of the com- 
munity. The numerical coding of the industry groups is not necessary 
for hand tabulation (which includes both scoring and hand sorting, 
described in the next subsection), but it saves space in the tally sheet to 
use the code numbers instead of the full industry designation. Numerical 
coding is necessary when mechanical tabulation is employed. 

Observe that the score marks are arranged in groups of five, four verti- 
cal and a diagonal. This facilitates counting. The second set of Score 
marks is for checking purposes. Since the tall}?' sheet is for but one area, 
it is necessary to combine the results from a number of such tally sheets 
to arrive at the figures for the entire community. The resulting ttible 
might appear as in Table 2.1. 

The score sheet is a useful device for organizing information from a 
small study. However, if there are many schedules to be scored or if it 
is desired to subdivide classifications, the score sheet becomes cumber- 
some. For example, if we wish to use the same categories of hours as 
shown on the score sheet but to show also males and females and at the 
same time distinguish between those who are heads of households and 
those who are not, we might have two major categories ^^head of house- 
hold’^ and ^^not head of household.’’ Each of these would be divided 
into ^^male” and ^^female,” and each of these four categories further sub- 
divided into the classes shown in the tally sheet on page 40. This would 
call for 4 X 6 == 24 columns and would result in a very sizeable tally 
sheet. It could, of course, be broken down into several score sheets, but 
it would be even better to use a different method of organizing the data. 

(2) Hand mrting. When a study does not involve too large a number 
of schedules, and when the schedules are small enough and on card-board 
or heavy paper, so that they can be handled readily, the data may be 
organized By a process of manual sorting. If we wished to obtain the 
information mentioned in the preceding paragraph, we might: (1) sort 
the cards into four piles— male heads of households, female heads of 
households, male non-heads, and female non-heads; (2) sort each of the 
four piles into the 27 industry categories, giving a maximum of 108 piles; 
and (3) sort each of these piles into the hours-of-work categories shown 
on page 40. The cards in each pile would then be counted to obtain the 
desired figures. 

(3) Mechanical tabulation. Mechanical tabulation involves the same 
basic procedure as hand sorting, but it is much faster. Mechanical 
sorting and tabulating (counting and totaling) devices enable the work 
of organizing the information of a statistical study to be done most expe** 
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AREA.. 


SCORED B Y 

CHECKED 


INDUSTRY AND HOURS WORKED 
MALE HEADS OF HOUSEHOLDS 


INDUSTRY 

GROUP 


35 HOURS 
OR MORE 


28 BUT 
LESS THAN 
35 HOURS 


21 BUT 
LESS THAN 
28 HOURS 


14 BUT 
LESS THAN 
21 HOURS 


7 BUT 
LESS THAN 
14 HOURS 


LESS THAN 
7 HOURS 



Industry and Hours Worked, Male Heads of Household. 


ditiously, provided, of course, that the study is extensive enough to 
warrant the use of such equipment. The use of mechanical tabulating 
equipment is recommended when there is a large number of schedules to 
be analyzed or when there are numerous entries on each schedule. The 
process consists essentially of the following steps: 


(a) Transforming all entries on the schedule into numerical terms, 
using appropriate codes. 
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The data shown in this table are for illustrative purposes. They do not represent an actual enumeration. 
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(b) liecording these entries on a punch card by punching holes to 
repr 03 ent the code numbers. A card-punch machine is shown on 
page 43r 

(c^ Sorting the cards and assembling the data by the use of machines.^® 

On page 44 there is shown a blank punch card and also an enlarged 
portion of a card, punched to represent the data of the edited schedule on 
page 38. The first entry on the card (103) identifies the area from which 
the schedule came. The next entry, using four columns, identifies the 
imusehold and enables the cards for each household to be brought 
together, if desired. The following two columns indicate the number of 
the card within the household, since there may be several cards for a 
household. The first nine numbers taken together make it possible to 
bring together any schedule and the punch card made from it, if desired. 
The next column shows by a that the individual was the head of a 
household; a ^^2’^ would indicate that he was not a head. Age is shown 
in the two following columns. In the next column, indicates that 
the respondent was male; for a female, ^^2^' is punched. The next 
column indicates race by these numbers: if, native white; native 
colored; 3j foreign born; 4f other; 0, not reported. The industry code, 
which has already been given, occupies the next four columns, two 
colufnns for regular employment and two for present employment. Two 
more columns take care of the answers to the self-coding Question 7. 
Question 8 calls for a numerical answer, which occupies the next two 
columns. The last three columns take care of the numerical answers to 
Question 9. Note that it was necessary to use only part of the punch 
card for this schedule. 

After the cards have been prepared, they are verified. This is accom- 
plished by reading each punched card against the schedule represented 
by it. The cards are examined by placing them over a source of light 
or over a black background. Alternatively, a special machine called a 
verifier may be used. The verifier resembles the card-punch machine, 
but it does not punch the cards. 

Following verification, the cards are sorted and tabulated by machine. 
The electronic statistical machine, '' shown on page 43, performs both 
of these operations. In addition to sorting, it will count and total and 
then print the results, for as many as 60 classifications, on the two rolls 

The devices pictured here are available from the International Business Machine 
Corporation, 590 Madison Ave., New York City. Punched card equipment may 
also be had from Remington Rand, Inc., 315 Fourth Ave,, New York City; Burroughs 
Corporation, Special Machines Department, 219 Fourth Ave., New York City; and 
Underwood Corporation, Samas Punched Card Division, One Park Ave., New 
York City. ^ 
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A Portion of a Punch Card, Showing How the Edited Schedule on 
Page Would Be Recorded* 
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' of paper shown at the rear of the machine. This machine will also Sheck 
cards for consistenc}^ of information [See paragraph (d) under ‘^Editing.’'] 
based on pre-established criteria. 

A simple device, useful for small studies, is known as Keysort^® and 
employs cards having holes around the edges. Information is recorded 
b}^ notching aivay the portion of the card between the hole and the edge 
as shown : 


Woo 

*31 4 a ! 


Notched and unnotched cards are separated by means of a large sorting 
needle. 

7. Presentation and analysis. After the information on the 
schedules has been organized by manual or mechanical means, the 
finished statistical tables and charts may be drawn up. Statistical tables 
are discussed in Chapter 3. Graphic presentation is considered in 
Chapters 4, 5, and 6. The analysis of statistical data is treated in 
Chapters 7 through 26. 

USING EXISTING SOURCES 

Primary versus secondary sources. As pointed out at the beginning 
of this chapter, statistical data may already exist which are suitable for 
use in a projected study. The data may or may not have been published. 
They may have been collected by an individual, a business firm, a research 
organization, a trade association, a local, state, or federal government 
office, a newspaper or magazine, and so forth. Some publications, such 
as the volumes of the United States Census of Population and Housing^ 
contain only data wffiich Avere collected by the issuing organization. Such 
sources are designated as primary. Other publications bring together 
data some or all of which were originally compiled by organizations other 
than the one responsible for the publication. These are referred to as 
secondary sources. The Survey of Current Business^ published monthly b}- 
the Office of Business Economies of the U. S. Department of Commerce, is 
a secondary source, as it includes data from many governmental and non- 
governmental sources. Obviously it is preferable to make use of a 
primary source Avhenever possible, but it may often be more convenient 
to make use of a secondary source. One invaluable secondary source of 
data is the Statistical Abstract of the Llvited States; ishued annually by the 

The Keysort is sold by the McBee Company, 29.5 ISfadison New York, N, Y. 
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U. S* Bureau of the Census. A number of other sources which are avail- 
able iih many libraries are listed in Appendix U. 

The reasons for preferring a primary source are: 

(1) •The secondary source may contain mistakes due to errors in tran- 
scription made when the figures were copied from the primary source. 

(2) The primary source frequently includes definitions of terms and 
units used. This is an important consideration, since intelligent use can 
hardly be made of data unless the user knows exactly what is meant by 
ea«h term or unit employed by the collecting agency. When data are 
taken* from several sources, it is particularly important that definitions 
of terms and units be scrutinized. The term family may sometimes 
have the limited meaning of father, mother, and offspring; sometimes it 
may^oe used more or less s 3 monymously with '^household.^' The term 

exports'^ may sometimes refer to gross exports (including re-exports); 
sometimes, to exports of United States merchandise only. Although a 
measured bushel is 2,150.4 cubic inches, a bushel does not represent the 
same number of pounds for all commodities. For example, a bushel of 
green peanuts in the shell weighs 22 pounds, a bushel of oats weighs 32 
pounds, and a bushel of apples weighs 45 pounds; but a bushel of wheat, 
beans, peas, or potatoes weighs 60 pounds. The Statistical Abstract of 
the XJfdted States, although a secondary source, includes the necessary 
definitions of units. 

(3) The primary source often includes a copy of the schedule and a 
description of the procedure used in selecting the sample and in collecting 
the data; the reader is thus enabled to ascertain how much confidence 
may be reposed in the findings of the study. 

(4) A primary source usually shows the data in greater detail. A 
secondary source often omits part of the information or combines cate- 
gories, such as showing counties instead of townships, or states instead 
of counties. 


Suitability of data. The analyst should not make use of data, from 
either a primary or a secondary source, without assuring himself as to 
the reliability, accuracy, and applicability of the data. There are 
numerous points worthy of consideration here: 

(1) If the enumeration was based on a sample, was the sample repre- 
sentative? 

(2) Was the schedule well designed? Were any leading questions or 
ambiguous questions included? 

(3) Was the collecting agency unbiased, or did it ‘^have an axe to 
grind ? It is well to remember that bias may enter either consciously or 
unconsciously. 
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(4) Was a selective factor introduced because of careless enunieratibii ? 
For example, in an unemployment study, canvassers might be cai:eless 
about following up their calls at houses where no one was at home, and 
thus perhaps the data would show a smaller number of employed persons 
than actually existed. 

(5) Were the enumerators capable and properly trained? Iiicom 
petent or poorly trained enumerators cannot be depended upon to pro- 
duce useful results. 

(6) Was the editing carefully and conscientiously done? Careless 
coding or computing on the part of editors render of little valiie^ tKfe 
findings of an otherwise valuable study. 

(7) Was the tabulating (tally sheets, sorting, or mechanical tabula- 
tions) performed with care and accurately checked? 

(8) In view of the definitions used, the area studied, ahd the methods 
of procedure, are the data applicable to the problem that is under investi- 
gation? 

It is not always possible to ascertain the quality of work which was 
done by enumerators, editors, and tabulators. As just noted, primary 
sources are apt to reproduce a copy of the schedule used and give a more 
or less adequate description of the methods and procedures followed. 
Additional information may frequently be had by correspondence. 

When using data over a period of years from a given source, we must be 
sure that definitions of terms have not changed or, if they have changed, 
to make due allowance for the change if it is possible to do so. For 
example, a new definition of the urban population was used for the 1950 
Census of Population. We shall not take the space to give the old and 
new definitions^^ in this text, but the object of the change was to include, 
as urban, more of the large and densely settled, unincorporated places, 
such as fringe areas around cities and unincorporated places of 2,500 or 
more inhabitants outside of an urban fringe. Data for 1950 were 
tabulated on the basis of both the old and the new definitions and showed 
an urban population of 88,927,464 using the old definition and 96,467,686 
on the basis of the new definition. For preceding censuses, data are avail- 
able only upon the basis of the old definition. 

Newspapers are not ordinarily good sources of statistical data, par- 
ticularly when the figures are in a news item. One reason for this is that 
newspaper copy is prepared and printed so rapidly that the material 
cannot be as carefully proofread as can the contents of magaisines and 

The new definition and the nature of the change are given in U. S. Bureau of the 
Census, £/. Cemus of Population: 1950, Vol. II, CharacieristicB of the Population, 
Pmxt 1, U. S. Summary, pp. Q-IO. 
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books. In addition, many figures quoted in news items are taken from^ 
speeches or statements from individuals who are themselves sources of 
dubious reliability. As an example consider this statement, made in a 
news item in one of the country's leading newspapers: “The estimated 
1952^-53 (Australian) wool clip is 3,740,000 bales, the largest on record. 
Competent observers consider destruction of the rabbits (which ate grass 
intended for sheep) has added 25,000,000 bales to the clip." There is 
no way of ascertaining, from the news item, which figure is correct. 
However, 'the first figure is approximately right, the second figure being 
grossly incorrect. 

Comparability of data from different sources. When data are to 
be drawn from two or more sources, the reliability of each source must be 
considered and, in addition, the user must be sure that the data from the 
different sourced are comparable. Let us list some of the reasons for lack 
of comparability: 

(1) Different definitions of terms may have been used. Coal produc- 
tion is given by the United States Bureau of Mines in short tons of 2,000 
pounds, while at one time exports of coal were shown by the Bureau of 
Foreign and Domestic Commerce in long tons of 2,240 pounds. Short 
tons are now used by both bureaus. United States stocks of raw and 
refined sugar are reported by the Department of Agriculture in short tons; 
Cuban stocks of raw sugar are given by the Weekly Statistical Sttgar Trade 
Journal in Spanish tons. A Spanish ton contains 2,271.64 English 
pounds. As if these three sorts of tons were not sufficiently confusing, it 
is necessary to be aware of two other “tons " used in shipping. These are 
the gross ton and the net (or registered) ton, each of wffiich represents 
100 cubic feet. Gross tonnage is the capacity of the hull plus the enclosed 
spaces above deck available for cargo, stores, passengers, and crew; 
whereas net tonnage is the gross tonnage less the space* occupied by pro- 
pelling machinery, fuel, crew quarters, master's cabin, and navigation 
spaces— in other words, approximately the space available for cargo and 
passengers. 

Because of different accounting systems, the term “profit" may have 
different meanings in different industries. Profit for a railroad may be 
quite different from profit for a department store. In a certain industry, 
carried on almost solely by partnerships, an investigator found that many 
firms showed little or no profit and that great differences were present 
among firms. The partners were frequently paying themselves generous 
salaries, and therefore- a new term, “profit plus partners' salaries," was 
used for the study! Ages may be reported as of the last birthday; as of 
the nearest birthday; or, in Oriental fashion, as of the next birthday. 
Comparability of age data is thu^s affected by the bases of reporting. 
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(2) Different methods of computation or estimation may have feeen 
employed. For example, the methods of estimating population .were 
responsible for two different inter-censal estimates of the Jiil}'’' 1, 1935, 
population of Yonkers, N. Y. One organization announced the popiila^ 
tion to be 144,233 while another estimated it as 157,455. The lower 
estimate assumed that Yonkers had grown, since 1930, the same per- 
centage as had the United States, the growth of the United States being 
determined by considering the excess of births over deaths and figures of 
net immigration. The second estimate appears to have been arrived at 
by assuming that the percentage change in the population of Yonkevs 
from 1930 to 1935 was about one-half of the percentage change from*1920 
t'o 1930. 

(3) The samples may have been so chosen that the results are not com- 
parable. Or, perchance, one study may have been ha@‘ed on a sainple 
whereas the other w^as a con\plete enumeration. It is, of course, possible 
so to choose a sample that the results of a study may be forced to* fit a 
preconceived idea. 

(4) Different standards of accuracy may have prevailed with respect 
to enumeration, editing, and tabulating. 

(5) The sources may not be comparable in respect to areas included, 
or in respect to the period of time to w^hich they refer. When the chrono- 
logical difference is not too great, comparisons may sometimes be ma^de or 
adjustments effected. 

Whether an investigator is using primary or secondaiy sources, it is 
necessary to keep on the lookout for obvious mistakes and misprints. 
For example, a secondary source stated that in Continental United States, 
in 1930, potential water power amounting to 38,110,000 horse po”wer was 
available 90 per cent of the time, while potential water pow'er of 9,166,000 
horse power was available 50 per cent of the time. It is clear that there 
must be a greater potential horse pow’^er available for 50 per cent of the 
time than for 90 per cent of the time. Data were given for each state and, 
if these details are added, it appears that 59, 1 66,000 horse power of poten- 
tial water power were available 50 per cent of the time. Obviously this 
was a typographical mistake wdiich occurred in printing the publication, 
or possibly “was carried over from the primary source. Such an apparent 
contradiction would be observed at once by the experienced user of 
figures. 



CHAPTER 3 

Statistical Tables 


METHODS OF PRESENTATION 

Four methods of statistical presentation are available. Data may be 
(1) incorporated in a paragraph of text, (2) put into tabular form, (3) 
placed in a semi-tabular arrangement, or (4) expressed graphically. 

Text presentation. Combining figures and text is not a particularly 
effective device, since it is necessary to read, or at least scan, all of the 
paragraph before one can grasp the meaning of the entire set of figures. 
Most persons cannot easily comprehend the data when set forth in this 
manner, and it is especially difficult for the reader to single out individual 
figures. There is the advantage, however, that the writer can direct 
attention to, and thus emphasize, certain figures and can also call atten- 
tion to comparisons of importance. Following is an example of text 
presentation: 

The 1950 Census of Population of the United States enumerated 
665,149 males and 659,940 females in Colorado. This state, the most 
populous in the Mountain division, had 568,778 males and 554,518 
female inhabitants in 1940. Next in population to Colorado, at the time 
of the Seventeenth (1950) Census, was Arizona, which had 379,059 
males and 370,528 females. At the 1940 enumeration, Arizona had but 
258,170 males and 241,091 females, a smaller total population than 
Utah, New Mexico, Montana, or Idaho. In 1950, Utah was third 
among the Mountain states, with 347,636 males and 341,226 females. 

At the time of the Sixteenth Census, Utah showed 278,620 males and 
271,690 females. Fourth, in 1950, was New Mexico, with its population 
consisting of 347,544 males and 333,643 females. At the preceding 
census, New Mexico had 271,846 males and 259,972 females. Montana 
was next in population after New Mexico, in 1950, with 309,423 males 
and 281,601 females. In 1940 Montanans population consisted of 
299,009 males and 260,447 females. Idaho followed Montana mith 
303,237 males and 285,400 females in 1950. A decade earlier, Idaho 
had 276,579 males and 243,294 females. Next to smallest in the 
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division in regard to population was Wyoming, where 154,853 males-* 
and 135,676 females were enumerated in 1950. Ten 3 rears before, 
Wyoming had 135,055 males and 115,687 females. Least populous 5f 
all the Mountain states was Nevada, with 85,017 males and 76,066 
females in 1950 and 61,341 males and 48,906 females in 1940. 


Tabular presentation. The same data that were included in the 
preceding text statement are shown in Tables 3.1 and 3.3. This method 
of setting forth statistical data is usually superior to the use of text. A 
table with its title should be fully self-explanatory, although it may fre- 
quently be accompanied by a paragraph of interpretation or a paragraph 
directing attention to important figures. 

TABLE 3.1 

Number of Inhohitants in the States of the MountMn Division, 
by Sex, 1940 and 1950 


1 

State 

1 Male 1 

1 Female 

1950 

1940 

1950 

1940 

Colorado 

665,149 

568,778 

659,940 

554,518 

Arizona 

379,059 

258,170 

370,528 

241,091 

Utah 

347,636 

278,620 

341,226 

271,690 

New Mexico 

347,544 

271,846 

333,643 

259,972 

Montana 

309,423 

299,009 

281,601 

260,447 

Idaho 

303 ,237 

276,579 

285,400 

248,294 

Wyoming 

154,853 

135,055 

136,676 

115,687 

Nevada 

i 85,017 

61,341 

75,066 

48,906 


Data from U. S. B\ireau of the Census, U* S, Census of Population: l$S0t Vol. IX, 
Characteristics of the Population y Table 13 in the Part for each state. 


It is readily seen that the table is much briefer than the text statement, 
since the row and column headings eliminate the necessity of repeating 
explanatory matter. As no text appears with the figures, the presenta- 
tion is more concise. The logical arrangement of items in the stub (the 
left-hand column and its heading) and box head (the headings of the other 
columns) makes a table clear and easy to read. The use of columns and 
rows for the figures facilitates comparisons. 

In Table 3.2 the various parts of a table have been slightly separated 
and labeled for identification. A table will have at least the four essen- 
tials: title, stub, box head, and body. There may also be present a prefa- 
tory note (see Table 12.2 or 12.3) and one or more footnotes, as in Table 
3.2. If the figures in the table are not original, a source note is also 
included, sometimes with the prefatory note but usually below the table 
and below the footnotes to the table, if any are present. 

Semi-tabuilar presentation. When only a few figures are to be used 
in a discussion, the text may be brokei\and the data listed as follows: 
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The number of passenger-miles flown per passenger fatality, by - 

scheduled domestic air lines, was 

87,118,531 in 1950, 

79,111,993 in 1951, 

282,536,326 in 1952. 


TABLE 3.2 


Population and Area of the United States, Territories, Title 

sions, and Other Holdings, 1950 J 


Population 


Region 


i Total 154,233,234 100. 00 3,628,130 \ 

Continental United States 150,697,361 97.71 3,022,387 

Territories : 

Hawaii 499,794 0.32 6,423 

Alaska 128,643 0.08 586,400 

Possessions * 

Puerto Rico 2,210,703 1.43 3,435 

Guam 59,498 0.04 206 

Virgin Islands of the U. S 26,665 0.02 133\ 

American Samoa 18,937 0.01 76/ 

Midway Islands 416 »* 2 

Wake Island 349 ** 3 

Other islands* 354 ** ' 33 

Canal Zone t 52,822 0.03 553 

Corn Islands# 1,304 ** 4 

Trust Territory of the Pacific 

Islands 54,843 0.04 8,475 

Population abroad! 481,545 0.31 ■■■ ^ 

I * For a list of the islands, banks, reefs, and cays included in this category, see the 
source given below. For some islands the area was not available. 

t Under jurisdiction of the United States by treaty with the Republic of Panama. 

# Leased from the Republic of Nicaragua. Population data are those of the 
May 1950 census of the Republic of Nicaragua. 

I Excludes citizens abroad on private business, travel, and so forth, many of 
whom were enumerated at their usual place of residence. Population data estimated 
from a sample. 

** Less than one-hundredth of one per cent. 

Source] Data from XJ. S. Bureau of the Census, U, S. Census of Papulalioti: 19S0, Vol. I, 
note I Mumher of Inhabitants^ Table 1 of United ^States Summary. 


This method is not often used, but it is serviceable in that the figures 
are made to stand out from the text as they would not do if worked into 
one or two sentences. Incidentally, the figures can be more readily 
compared than if they were in the text. 

Graphic presentation. Graphic devices are extremely, useful and 
effective for quickly presenting a limited amount of information. The 
three following chapters deal with curves^ bar charts, maps, and other 
statistical diagrams. 
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LEADING CONSIDERATIONS 

Types of tables. From the point of view of usage, there are two iypes 
of tables. In the first place there are general or reference tables, which 
are used as a repository of information. These are frequently very 
extensive, covering many pages, as, for example, United States Table 19 
in Population Volume I of the 1950 Census, which covers 13 pages. Such 
tables give detailed information arranged for ready reference. In a 
general table no attempt is made to arrange the entries so that emphasis 
will be placed on certain items, nor is there usually any reason for arrang- 
ing columns and rows in order to bring out comparisons desired by the 
investigator. The primary, and usually sole, purpose of a reference table 
is to present the data in such a manner that individual items may 
found readily by a reader. Reference or general tables are often placed 
in an appendix or a separate part of a published report.^ 

In the second place there are summary or text tables, which are usually 
relatively small in size and which are designed to set forth one finding or a 
few closely related findings as effectively as possible. While the reference 
table may be rather complicated, with subheadings and sub-subheadings 
in stub and caption, the summary table should be relatively simple in 
construction. It frequently accompanies a text discussion and hence js 
also referred to as a text table. If a reader is expected to divert his atten- 
tion from a running discourse to a table, it is essential that the table be 
not too formidable, but simple and easy to understand. Too many 
readers have a tendency to skip all the tables in a report. This tendency" 
can be combatted successfully only by making tables appear so simple 
as to be interesting and by introducing graphs that are attractive and not 
unduly complicated. Because of the purpose which a summary table is 
to serve, the items shown therein will be arranged to place emphasis 
where desired, and the columns and rows will be so placed as to facilitate 
the comparisons of paramount importance. 

A summary table is almost invariably the result of boiling down infor- 
mation contained in one or more reference tables, although upon occasion 
a summary table may be based, in whole or in part, upon one or more 
other summary tables. Still more rarely, a summary table may be con- 
structed directly from data contained in schedule forms. The methods 
which can be used in deriving one table from one or more others are: 

L Data which are not important for the problem in hand may be 
omitted. Thus, although there are about twenty states which produce 


^ See, for example, Part 5 of the Annual Report of the Federal Deposit Insurance 
Corporation for the Year Ended December 31, 1^2. 
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sizeable amounts of bituminous coal, it might suffice to show sepa-rate 
data for only the ten or twelve leading states, 

2. ^ Detailed data may be combined into groups. For example, data 
shown by states may be grouped into geographical divisions. Again, 
data shown by individual industries may be combined into broader indus- 
trial groups. For example, the manufacture of brick, tile, and terra 
cotta products; of cement, glass, and pottery; and the quarrying of 
marble, granite, slate, and like products may be combined into the major 
category ^‘clay, stone, and glass products/^ 

3. " The arrangement of data may be altered. Thus an alphabetical 
arrangement of cities may be replaced by an arrangement according to 
size of municipality. 

4. Averages, ratios, percentages, or other computed measures may be 
substituted for, or given in addition to, the original absolute figures, A 
column of percentages is shown in Table 3.5. It will be observed that 
these figures ^facilitate the interpretation of the data upon which they 
are based. 

Comparisons. While the arrangement into columns and rows 
makes it easy to compare the data, such treatment does not automatically 
focus attention upon the comparisons that are important. This may be 
effected by placing the figures to be compared in contiguous columns or 
rows. Thus it may be seen that Table 3.1 facilitates the comparison of 
data„ obtained at the two censuses for either males or females, while Table 
3.3 makes it easy to compare the number of males and females enumerated 
at either census. 


TABLE 3.3 

Number of Inhabitants in the States of the Mountain Division^ 
by Sex, 1940 and 1950 


State 

1 1950 1 

1940 

Male 

Female 

Male 

Female 

Colorado 

665,149 

669,940 

568,778 

554,518 

Arizona 

379,069 

370,528 

258,170 

241,091 

Utah 

347,636 

341,226 

i 278,620 

271 ,600 

New Mexico 

347,544 

333,643 

271,846 

259,972 

Montana 

309,423 

281,601 

299,009 

260,447 

Idaho 

303,237 

285,400 

276,579 

248,201 

Wyoming. ........ 1 

154,853 

135,676 

135,055 

116,687 

Nevada 

85,017 

76,066 

61,341 i 

48,901) 


Data from U. S, Bureati of the Census,, V. S. Cenms of Population: IB, 50^ Vol. II, Charao-^ 
teriMm of the Population, Table 13 in the Fart for each state. 


Bach of these tables is well constructed, but each focuses attention upon 
a different comparison. One of the most importatit considerations in 
table construction is that figures which are to be compared must be placed 
in immediate juxtaposition. It should be remembered that two or more 
series of figures are more easily compared when placed in adjacent columns 
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than when plactu in adjacent rows, and that figures of a series are more 
easily compared with each other when arranged in a column than when 
placed in a row. 

Comparisons may be greatly facilitated by the use of ratios, percent- 
ages, averages, or other computed relationships. Ratios are shown in 
Table 7.4; percentages, which are really a form of ratio (see Chapter 7), 

TABLE 3.4 


Populaimn and Area of the United States^ Territories^ Possestsions, and^ 

Other Holdings f 1950 



1 Population 

Gross area 

Region 

Number 

Per cent 
of total 

in square 
miles , ^ 

Total 1 

154,233,234 

100.00 "■ 

3,628,130 

Continental United States 

150,697,361 

97.71 

3,022,387 

Territories: 




Hawaii 

499,794 

0.32 

6,423 

Alaska 

128,643 

0.08 

586,400 

Possessions: 



Puerto Rico 

2,210,703 

1.43 

3,435 

Guam 

59,498 

0.04 

206 

Virgin Islands of the U. S 

26,665 

0.02 

133 

American Samoa 

18,937 

0.01 

76 

Midway Islands 

' 416 


2 

Wake Island 

349 

Ht* 

3 

Other islands* 

354 


33 

Canal Zonef 

52,822 

: 0.03 

553 

Corn Islands#. 

1,304 

** 

4 

Trust Territory of the Pacific Islands . . . 

54,843 

1 0.04 

8,475 

Population abroad t 

481,545 

0.31 



* For a list of the islands,, banks, reefs, and cays included in this category, see the source given 
boiow. For some islands the area was not available. 

t Under jurisdiction of the United States by treaty with the Republic of Panama. 

# Leased from the Republic of Nicaragua. Population data are those of the May 1950 census of 
the Republic of Nicaragua, 

t Excludes citizens abroad on private business, travel, and so forth, many of whom were enumer- 
ated at their usual place of residence. Population data estimated from a sample. 

** XiCSS than one-hundredth of one per cent. 

Data from IT. S. Bureau of the Census, U, 8 . Cenma of Poptdation: 1950, Vol. I, Number of TnhaM’' 
ianis, Table 1 of United States Summary, 


are included in Tables 3.4, 3.5, and 3.7. Ratios and percentages are par- 
ticularly useful when the absolute figures to be compared are large. 
Note that in Tables 3.4 and 3.6 rather large population figures can be 
compared readily by the use of percentages. When tables show monthly 
fluctuations and both maxima and minima are noted, as in Table 3.7, the 
additional entry “minimum as percentage of maximum” is useful for 
purposes of comparison. Averages are shown in Table-s 14.1, 14.3, and 
14,7. 
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Empkmis. The proper placing of an item in a table enables it to be 
gi\'en^' suitable emphasis. Since occidentals read from left to right and 
from top to bottom^ it follows that the most prominent position in the 
stub |s at the top^ and in the box head the most prominent position is at 
the left; likewise, the position of least prominence is at the bottom of the 
stub and at the right of the box head. Notice that, by following this 
principle in Table 3.3, males were emphasized rather than females, and 
1950 was placed in a more prominent position than 1940. 

TABLE 3.5 

White Population and Foreign-born W'hite Population of the 
New England States^ 1950 


State 

Wiite 

population 

Foreign-horn 

white 

population 

Per cent 
foreign 
born 

Massachusetts ' 

4,611,503 

713,699 ^ 

15.5 

Connecticut 

1.952,329 

297,859 

15.3 

Rhode Island 

777,015 

113,264 

14.6 

New Hampshire 

532,275 

58,134 

10.9 

Maine 

910,846 

74,342 

8.2 

Vermont 

377,388 

28,753 

j 7.6 


Data from U. S. Bureau of the Cen&ua, U. S, Cenms of Fapulation: 19S0, Vol. II, 
Characteristics of the Populatton, Part I, United States Summary, p. 1-106. 


Totals are generally placed in either the most prominent or the least 
prominent position, depending upon whether or not it is desired to give 
emphasis to them. When “total ” is shown at the top in the stub, a line 
should be placed below the first row of figures, as in Table 3.4. If the 
total entry is at the bottom of the stub, the figures are set off by a line 
drawn above them, as in Table 3.7. An alternative procedure consists 
of using a space instead of a line to set off the totals. Whatever its posi- 
tion, the word “total” in the stub should be indented if possible. 

Individual figures, or columns or rows of figures, may also be empha- 
sized by the use of boldface type, as in Table 3.6. When monthly 
fluctuations of employment, sales, or other factors are shown, the maxi- 
mum figure may be set in boldface and the minimum may be put in italic 
type, as in T able 3.7. In general, italics are used ‘to indicate an exception 
rather than for emphasis. Thus, in Tables I and 18 of AgricvUural 
Staiistics 195$, the figures in italics are census returns, whereas all other 
figures are compilations or estimates made by the Bureau of Agricultural 
Economics. Italics are also sometimes used to show deficits, items to be 
subtracted in arriving at a total, and items to be omitted from a total. 

Arrangement of items in stub and caption. Considering the basic 
nature of statistical data which may be encountered, it was noted (page 3) 
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that data may refer to geographical, chronological, qualitative, or quanti- 
tative classifications. We are now interested in the methods which 'may 
be employed in arranging the items in the stub or the box head of'a table. 
The method of arrangement will be determined partly by the natuFe of 

T.\BLE 3.6 

Analysis of Bisbursememts and Recoveries of the Federal Deposit Insurance 
Corporation in Transactions for Protection of Depositors and to Facili- 
tate Termination of Liquidations^ 1934-1952 


(In thousand?) 



Transactions for the protection 
of depositors 

Transac- 
tions to 

Item 

Total 

(420 

banks) 

Receiver- 

ship 

cases 

(245 

banks) 

Absorp- 
tion cases 
(175 
banks) 

facilitate 
terminfi,- 
tion of 
liquida- 
tions 

Disbursements 

$322,148 

$87,827 

$234,321 

$2,993 

Principal 

276,044 

87,044 ■ 

180,000 

2,716 

Payoff expenses (nonrecoverable) . . 

783 

783 


. . . 

Liquidation expenses 

13,266 


13,266 

277 

Advances for asset protection 

32,055 


32,055 

Recoveries and income 

302,448 

73,213 

229,235 

3,789 

Principal recovery to Dec. 31, 1952. . 

247,392 

72,866 

174,526 

1,691 

Estimated additional recovery of 




principal* 

1,020 


1,020 

1,005 

Liquidation expenses 

13,266 


13,266 


Advances 

32,055 


32,065 

277 

Interest and allowable return (profit 
and income in termination trans- 





actions) 

8,715 

347 1 

8,368# 

816t 

INet loss of funds . . 

19,700 

14,614 

5,086 

-796** 

On principal 

27,632 

14,178 

13,454 

20 

Payoff expenses (nonrecoverable) , . . 

783 

783 



Less: interest and income 

8,715 

347 1 

8,368# 

8161 


^ Book value of remaining unli<iuidated assets less ref-erve for losses. The total amount for both 
tjrpes of transactions, $2,025,139, is designated m Table 10 (of the Report) as “Assets acquired through 
bank suspensions and absorptions.” 

t Interest on subrogated claims in 58 of the receivership cases in which receivers paid 100 per cent 
dividends on creditors’ claims, 

§ Interest on loans and allowable return on purchase price in 91 absorption eases in which collections 
exceeded the Corporation’s disbursements and recoverable exj»enses. In 05 of these cases full interest 
or allowable return was collected and excess collections of $1,519,000 returned to the banks, 

$ Profit plus net incom© (income on assets less liquidation expenses), 

** Excess of receipts. 

From Annual Report of the Federal Deposit Insurance Corporation for the year ended December 31, 
1952, p. 10. 

the data (whether basically geographical, chronological, qualitative, or 
quantitative), and partly by a consideration of whether the data are to 
appear in a reference table or in a summary table. A number of different 
methods of arrangement may be employBd. 
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Alphabetical This method of arrangement is admirably adapted for 
use in a general table, because it enables individual items to be located 
with ease. It is, obviously, not a useful method for text tables. It can be 
used only with series which are classified geographically or qualitatively. 

Geographical. The geographical method of arrangement may le 
employed for series classified geographically, but it is applicable only 
when an established usage has been set up and should be used only when 
the statistician is sure that his readers are familiar with the classification. 
The-customary order of the geographic divisions of the United States and 


TABLE 3.7 

Number of New Permanent Non-farm Dwelling Units Started in Urban and 
Rural Locations, by Source of Funds^ January-December 1952 



Privately financed 

Publicly financed 

Privately and publicly 
financed 

Month 

Urban 

I Rural 
nou- 
farm 

Total 

Urban 

Rural 

non- 

farm 

Total 

Urban 

Rural 

non- 

farm 

Total 

January 

S£,800 

1 28,000 

61,400 

3,300 

200 

3,600 

36,100 

28,800 

64M0 

February 

39,700 

34.600 

74,300 

3,100 

300 

3,400 

42,800 

34,900 

77,700 

March 

46,600 

44,500 

91,100 

11.900 

900 

12,800 

58,500 

45,400 

103,900 

April 

50.400 

46,600 

97,000 

8,600 

600 

9,200 

1 69,000 

47,200 

106,200 

May 

52,400 

48,600 

! 101,000 

8.300 

300 

8,600 

50,700 

48,900 

109,500 

June r 

49,900 

47,000 

j 96,900 

6,200 

400 

6,600 

1 56,100 

1 47,400 

103,500 

July 

50,900 

50,200 

101,100 

1,500 


1,500 

52,400 

50,200 

102,600 

August 

49,400 

1 48.000 

97,400 

U400 

300 

1,700 

60,800 

48,300 

99,100 

Septembe» 

51,300 

47,900 

99,200 

1,600 

[ 100 

1,600 

52,800 

48,000 

100,800 

October 

52,100 

47,100 

99,200 

1,700 

200 

1,900 

53.800 

47,300 

101,100 

Novomoer 

42,300 

40,000 

82,300 

3,700 

100 

3,800 

46,000 

40,100 

86,100 

December . . 

36,800 

30,800 

67,600 

3,800 

100 

3,900 

40,600 

30,900 

71,500 

Total 

554,600 

513,900 

1,068,500 

55,000 

w 

bi 

O 

O 

68,500 

609,600 

517,400 

1,127,000 

Minimum aa percent- 
age oi maximum .... 

62.6 

57.0 

60.7 

11.8 


11.7 

59.5 

67.4 

59.2 


* Fewer than 50 units. 

Bata from Monthly Labor lievtew, May 1953, p, 589. 


of the various states may be seen in Table 6, of the United States sum- 
mary, in Volume I of the 1950 Untied States Census of Population. 
Although the Census makes frequent use of the geographical method of 
arrangement for the states, it almost invariably lists the counties of a 
state alphabetically. For ease of reference, in a general table, the geo- 
graphical arrangement is hardly so satisfactory as the alphabetical. 
Although it may be argued that the geographical arrangement often 
places together contiguous, and therefore comparable, areas, it must be 
obvious that the geographical arrangement, does not always do so; It is 
not usually a good method of arrangement for a summary table, since 
this arrangement does not place iixlportant items in prominent positions. 
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Magnitude. A very satisfactory method of arranging items in a sum- 
mary table consists of listing them according to size^ usually witH the 
largest item first, but sometimes with the order reversed. The states 
shown in the stub of Table 3.3 are given in order of magnitude. When 
the largest item is placed first, the most important items (numerically) 
are placed in the most prominent positions. Arrangement of items 
according to size is not useful in a general table because it does not 
facilitate the finding of individual items as does the alphabetical arrange- 
ment. Data classified geographically or qualitatively may be arranged 
according to magnitude. So also may data classified chronologically, 'but 
they lose their chronological sequence when arranged by magnitude. 

Historical, Data classified on a chronological basis would generally 
be arranged chronologically or historically. When years are listed, either 
the most recent or the earliest date may be shown first. The "months, 
however, are customarily listed with January first. When the histolical 
arrangement is called for, it may be used in either general or text tables. 
The historical arrangement is used in the stub of various tables in 
Chapter 12.’ 

Customary, Certain data that are basically qualitative are generally 
arranged according to customary classes. Exports and imports are 
often grouped into five categories: crude materials, crude foodstuffs, 
manufactured foodstuffs, semi-manufactures, and finished manufactures. 
The population of the United States, when divided into groups upon a 
so-called ^^race-nativity basis, is usually subdivided into the following 
classes: native White, foreign-born White, Negro, Indian, Japanese, 
Chinese, and ''all other.^’ These are ordinarily listed in the order given. 
When an "all other group appears in a table, it is ordinarily placed at 
the bottom in the stub, or at the right in the box head. Good statistical 
practice dictates that an "all other, "miscellaneous,^^ or "not reported'^ 
group should include relatively small numbers; otherwise, the adequacy of 
the <Jas8ification or the accuracy of the collection of the data may be ques- 
tioned. Arrangemeiit by customary classes is appropriate for either a 
text or a reference table. Quantitative data may be arranged into classes 
as shown in the stub of Table 8.6. Such arrangements usually begin with 
the class of smallest numerical value and may be used in either a text or 
a reference table. 

Progressive. This method of arrangement is illustrated in the stub of 
Table 3.6. Notice that the items are listed in such a way that the final 
figure develops logically from those given before. Another example of 
the progressive arrangement was shown in the box head of a table which 
presented monthly data of the number of strikes in the United States 
during a year. The progressive headings in the box head were: 
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Con- : 

Begin- 

Li 


In 

tmued 

prog- 

Ended 

effect 

from 

ning 

ress 

in 

at end 

preced- 

in 

during 

month 

of 

ing 

month j 

month 

month 


month 


The progressive arrangement is suitable for either text or reference tables. 

Numerical. The wards of cities are usually designated as Ward 1, 
Ward 2, and so forth. When data for such subdivisions are shown, a 
num'erical arrangement is generally followed. The precincts and districts 
of counties are sometimes numbered; the departments of a factory and 
salesmen's territories or sales areas may also be identified by numerical 
designations. This method may appear in either a text or a reference 
table. The numbers assigned to the categories are frequently only labels 
serving to identify some underlying arrangement. For example, in a 
shoe factory, Department 1 was the cutting department; Department 2, 
the fitting department; Department 3, the lasting department; and so 
forth. 

In using the various methods of arrangement, remember that in a 
reference table, the items should be arranged for greatest ease of reference, 
whereas in a text table the arrangement should be designed to emphasize 
the important items and to stress the proper comparisons. 

DETAILS OF TABLE CONSTRUCTION 

Title and identification. A title should accompany every table and 
is customarily placed above the table. The title should be clearly worded 
and should state briefly what data are shown in the table. A title should 
be so worded as to mention the more important considerations first, 
placing toward the end any statement concerning how the items are 
arranged and what period of time is covered. In general the title states, 
in order: what, where, how classified, and when. Illustrations of titles 
are shown in the various tables of this chapter. It will be noted that, 
when a title necessitates the use of several lines, an inverted-pyramid 
arrangement is used. 

If a title is long, it may be advantageous to place a ^4atch title" above 
the main title or, occasionally, to substitute the catch title for the full 
title. This shorter title undertakes merely to state the general nature of 
the data in the table. For Table 3,7 a catch title might read ^^New 
Dwelling Units in 1952." 

When more than one table is included in a study, it is desirable to num- 
ber the tables consecutively in order that each one may be identified by 
number rather than by title. 
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Prefatory note and footnotes* A prefatory note^ one or more foot- 
notes, and a source note may be appended to a table. A prefatory note is 
placed just below the title and in smaller or less prominent ^ype. The 
prefatory note provides an explanation concerning the entire table or a 
substantial part of it, as in Table 3.6. 

Explanations concerning individual figures, or a column or row of 
figures, should be given in footnotes. Footnotes keyed to stub entries 
and column headings may be referred to by means of numbers, however, 
footnotes keyed to figures should be identified by a symbol f? #? I? 
etc.), as in Table 3.6, or by a letter, but preferably not by a number. In 
this book, symbols have been used for keying footnotes to figures, stub 
entries, column headings, and titles of tables. 

Source notes. As previously indicated, the source note may appear 
below the title or below the footnotes. The latter practice haKS been gen- 
erally followed in this text. The data set forth in a table will not often 
be material which the investigator has collected. Usually the figures will 
have been taken from one or more published or unpublished sources. 
The source note should be complete, giving author, title, volume, page, 
publisher, and date. Not only is it courteous to mention the source of 
data quoted, but such information gives the reader some idea of the 
reliability of the data and makes it possible for him to refer to the original 
source to verify quoted figures or to obtain additional information. 

Sometimes data are taken from a secondary source instead of a primary 
source because the secondary source may be more convenient. In such 
a case it may be advisable to mention both sources; for example, Source: 
National Board of Fire Underwriters as quoted in StaMsUccd Abstract of 
the United States^ 1963, p. 470.^^ 

Data for a table may sometimes be taken from two or more different 
sources. When this is done, care must be exercised to see that the data 
are comparable. The importance of comparability of data was discussed 
ill Chapter 2; it is not necessary to say more on that topic at this point. 

When apparent mistakes are found in a source, it is well to call atten- 
tion to the fact. The December 1935 Monthly Labor Review (p. 1503) 
reprints a table from The Oriental Economist showing that total payrolls 
in 10 industries in Japan in 1933 were 647,340,199 yen, but points out in 
a footnote that, if the figures given for each of the 10 industries are added, 
the result is 647,430,199 yen. 

Precentages. When percentages are used in a table, the stub or the 
caption entry should indicate clearly to what figures the percentages 
relate. Thus, the term ^^per cent^' alone should be avoided; rather say, 
^^per cent of total,'^ ^^per cent of increase or decrease, and so forth. 
Sometimes tables are divided into a ^'mimber^^ section (showing absolute 
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figures) and a “per cent” section, as in Table 7.5. This table and Table 
7.2 illustrate the use of adequate headings referring to percentages. 

The percentages in the last column of Table 8.6 total 99.9, while those 
in the column just to the left of that one total 99.8. When individual 
percentages are written correct to tenths of one per cent, as is customary, 
the total will occasionally be slightly over or below 100.0 because of the 
accumulation of positive or negative remainders when rounding. If the 
percentages had been entered in hundredths or thousandths of a per cent, 
the total would have been closer to 100.0. Although a “per cent of 
total” column may add to slightly more or less than 100.0, the total is 
shown as 100.0, since that is what the individual percentages would yield 
if carried out far enough. If a total adds to less than 99.8 or more than 
ieO.2, it is advisable to re-check the calculations for mistakes. 

Rounding numbers. In order to avoid confusion and to facilitate 
comparisons, numbers of many digits may be rounded. Numbers may 
also be rounded because the compiler feels that they are accurate, not to 
the final digit, but only in terms of (say) thousands or millions. The 
figures shown in Table 3.7 were rounded (but no digits dropped!) pre- 
sumably to call attention to the fact that they were estimates. 

When numbers are rounded, a statement to that effect should be made 
in A prefatory note or in the stub or the box head. The wording may be 
“millions of . . . ,” “000,000 omitted,” and like expressions. Tables 
3.6, 7.1, and 7.2 contain rounded numbers, and mention of that fact is 
made in a prefatory note or in the appropriate box head. 

If a series of fibres is to be expressed in thousands of dollars, for 
example, the rounding is to the nearest thousand. Thus $2,648,302 would 
become $2,648 (thousand) and $7,226,782 would become $7,227 (thou- 
sand). If the heading “thousands of dollars” appears in the box head 
(or stub) of a table or as a prefatory note, the dollar mark is not needed. 

No serious error is ordinarily introduced by rounding. If each of a 
series of numbers is rounded, some will be raised and some will be lowered, 
but the errors so introduced tend to offset each other. Furthermore, it 
may be felt that to show all the digits of a large number is to give the 
appearance of spurious accuracy. For example, the population of the 
United States was ascertained to be 150,697,361 persons in 1950, but the 
figure could hardly be accurate to units or even to hundreds. However, 
it may be maintained that the figure 150,697,361 is the one obtained by 
the best methods available and is therefore probably more accurate than 
any rounded figure. Irrespective of the merits of these two points of 
view, six (or fewer) significant figures may often be accurate enough for 
the comparisons desired. Further mention of rounding (and of significant 
digits) is made on pages 139-140 and in Appendix T. 
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When computed values, such as totals, percentages, and averages, are 
to be shown in tables of rounded figures, these values should, if pt)ssible, 
be calculated from the original figures before rounding. 

Totals. We have previously noted that totals, when of major impor- 
tance, may be placed at the top in the stub and at the left in the caption. 
When it is not desired to emphasize totals, they may be placed at the bot- 
tom in the stub and at the right in the caption. 

Table 3.7 carries both total columns and a total row. An arrangement 
such as this results in a single number (1,127,000) which is sometipaes 
termed a ^^grand totaP' or a ^'checked grand total.^’ The fact that the 
figures yield the same sum when added vertically and horizontally is not a 
positive check, since two or more compensating errors may have been 
made. That, however, does not often happen. We^do have definite 
proof either that no errors were made or that more than one was made. 

Units. The units of measurement of the figures in a column ora row 
of a table may often be self-explanatory. When this is not true, the 
nature of the unit should be made clear in the stub or the box head, as in 
Table 7.1. If the explanation applies to all figures in the table, it may 
appear as a prefatory note. Data of monetary units are usually self- 
descriptive, because of the use of the dollar sign. Note, in Table 3.6, 
that this sign appears for only the first entry in a column. 

Size and shape of table. In general, a table should be designed so 
that it will be neither very long and narrow nor very short and wide. A 
table must also be adjusted to the space in which it is to appear. Usually 
this limitation takes the form of a page of a book or a report. Of course, 
a table need not occupy the entire length or width of a page. If the table 
is too large for the allotted space, it may be recast into several smaller 
tables. Reduction of type size may permit a table to be included on a 
page, but reduction should not be made at the expense of legibility. If 
the use of a folded page is not desirable, the table may be arranged to 
occupy two facing pages. Because of the difficulty of aligning pages 
perfectly in binding, the stub is often repeated on the second page. 
When reference tables are continued over several pages, they may be split 
either vertically or horizontally. In either case, complete stub and 
caption entries should appear on each page, the title should be repeated 
on each page, and footnotes may appear at the bottom of the appropriate 
page or may be accumulated at the end of the table. 

The horizontal dimension of a table may be determined by allowing for: 

(1) Width of stub, determined by longest entry. (A very long entry 
may be put on two or more lines to save space; see the last item in the 
stub of Table 3.7.) 

(2) Width of each column, determined by largest number or by entry 
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in each box head. (By hyphenating words, an entry in a box head may 
be compressed horisiontally and expanded vertically^) 

(3) Ruling. 

(4) Margins. 

The vertical dimension may be ascertained by considering: 

(1) Space needed for title, prefatory note, footnotes, and source note. 
Since the first line of the title should not exceed the table in width, a long 
title may require several lines. 

(2) Number of lines needed for the heading, in the stub or box head, 
which requires the most vertical space. 

(3) Number of rows in body of table. 

(4) Ruling. 

(5) Margins. 

Ruling. Most of the tables in this text are shown with single-line 
ruling and are open at the sides. Double-line ruling is sometimes used, 
but double lines seem to make either hand-ruled or printed tables appear 
somewhat complicated. Tables are rarely closed at the sides, and should 
never appear with one side closed and one open. 

There seems to be a growing tendency to use text tables without ruling, 
eii;her vertical or horizontal. Table 3.8 shows how Table 3.5 might 
appear when no ruling is used. 


TABLE 3.8 


White Population and Foreign^^born White Population of the 
New England States, 1950 


State 


White 

population 


Foreigri^-born Per cent 
white foreign 

population born 


Massachusetts.. 4,611,503 713,699 15.5 

Connecticut 1,952,329 297,859 15.3 

Rhode Island 777,015 113,264 14.6 

New Hampshire 532,276 58,134 10.9 

Maine 910,846 74,342 8.2 

Vermont 377,188 28,753 7.6 


Data from XJ. S. Bureau of the Census, XJ, S. Census of Population: 19BQ, Vol, 
II, Charaeteristics of the Population^ Part I, United States Summaiy, i». i jlOS. 


An examination of tables in this book and elsewhere will show that: 

> (1) No horizontal lines are used in the body of a table except to set off 
totals and occasionally to separate a table into distinct parts. 

' (2) Horizontal lines separating major and minor box heads do not con- 
tinue into the stub heading. 

(3) All vertical lines separating box heads appear only between the box 
Leads which they separate; they do not extend above these bbx heads. 
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Guiding tlie eye. Skipping a line every three, four, or five rows, as in 
Table 3.7, makes it easier for the eye to follow the rows across a table. 
The use of leaders in the stub of a table is also helpful. 

Zeros. It is not customary to show a zero in a table (other than a 
computation form). When no cases have been found to exist or when the 
value of an item is zero, the fact may be indicated by means of dots (...) 
or short dashes( — ), When there is no figure for an entry because infor- 
mation is lacking, a footnote should be used to indicate that fact. 

Size and style of type. Too much variety in size or style of type {(v: 
lettering) is not desirable. In general the title should be most prominent 
and is usually set in large and small capitals or in boldface type. The 
items listed in the stub and caption and the figures in the body of the table 
are usually set in the same size type. Footnotes, prefatory note, aM 
source not^ are generally set in smaller type than that used in the body of 
the table. 


STATISTICAL REPORTS 

When making a statistical report, the method of preparing the tables 
will be dictated partly by the number of copies of the report required and 
partly by the cost involved. Tables may be handwritten, typewritten, 
mimeographed, multigraphed, reproduced by a photostatic or photo- 
graphic process from handwritten or typed tables, or printed. 

There is a distinct disadvantage in the use of the ordinary typewriter 
for preparing other than relatively simple tables, because of the lack of 
flexibility of spacing and of size of type. Table 3.9 shows a table without 
ruling, prepared on an ordinary typewriter with pica type. Table 3.10 
presents the same data and indicates how ruling may be done on a type- 
writer. N ote that more flexibility was obtained by using two typewriters, 
one with pica and one with elite type. By using elite type for the stub 
entries and the body, a certain amount of space may be saved. Some- 
\vhat more flexibility in planning a table may be had by using a typewriter 
with variable spacing and with different kinds and sizes of type. 

If only a few copies of a report are required and if the tables are simple, 
the tables and accompanying text may be typed and carbon copies made. 
If several dozen copies are needed, the longhand or typed material may 
be photostated at a cost of about 25 cents per Si X 11-inch page. By 
this method, reduction or enlarging is possible and copies may be had 
rather promptly, since no plate need be made. If a larger number of 
copies is required, resort may be had to mimeographing or multigrapMiig. 
Tables may also be reproduced by a photo-offset process, which is quite 
satisfactory and is often cheaper than printing because typesetting is 
avoided. Enlarging or reduction is possible; typed material may be 
reduced so that 4 ordinary 8-| X 11-inch pages (pica type) will appear on 
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one page. It should be noted that the typed copy should be a first-class 
job if satisfactory reproductions are to be obtained. 

OccaaonaUy the gelatin-pan method may be useful when only a few 
copies are needed. A special ink is available for handwritten material 

TASIS 3.9 

HUKBEH OF INHABITANTS IK THE STATES OF THE MOUNTAIN DIVISION, 

BY SEX, 19^0 AND 1950 


Male 

135S i2il2 1252 12ii2 

Colorado 665 , 1'*9 568,778 659. 9M) 55‘*.5l8 

Arlsona 379,059 258.170 370.528 241,091 

Utah 347,636 278,620 341.226 271.690 

Ne« Mexico 347,544 271.846 333.643 259,972 

Montana 309,423 299,009 281,601 260,44? 

Idaho 303,237 276,579 285,400 248,294 

Wyoalijj 154,853 135,055 135.676 115.68? 

Nevada 85.017 61,341 75,066 48.906 


Data froBJ U.S, Bareau of the Census, jg.£. Census 3 £ Pqpw * 
l atlon f 195p, Vol. II, Characteristics. ^ jSg 
ble 13 In the Part for each state. 

TABLE 3.10 

KimBEH OF INHABITANTS IN THE STATES OP THE MOUNTAIN 
DIVISION, BY SEX, 19^0 AND 1950 



aad for illustrations; also, ribbons and carbon paper may be obtained for 
typed material This method is hardly so satisfactory as those above 
mentioned, but it permits the making of a few copies by anyone with 
rather inexpensive equipment. Enlarging and reducing are impossible* 



CHAPTER 4 


Graphic Presentation I: 

CURVES USING ARITHMETIC SCALES 


THE GRAPHIC METHOD 

Attention has already been given to the presentation of statistical data 
by means of text, tabular, and semi-tabular devices. Ordinarily, sta- 
tistical data will be presented in the form of either a table or a chart. 
This chapter and the two which follow are devoted to a discussion of the 
portrayal of statistical data by graphic devices. As will be readily seen 
from a perusal of the pages of this book, charts or graphs are more effec- 
tive in attracting attention than are any of the other methods of present- 
ing data. Readers are therefore not so likely to skip a chart as to skip a 
table, A simple, attractive, well-constructed graph, showing a limited 
set of facts, is also easier to understand than is a table. 

The outstanding effectiveness of a chart for presenting a limited 
amount of data makes it a most useful statistical tool. Certain limita- 
tions should be noted, however. In the first place, charts cannot show’' 
so many sets of facts as may be shown in a table. Numerous columns 
and rows may appear in a table; but imagine Chart 4.2 with six or eight 
criss-crossing and intertwining lines, and it is immediately obvious why a 
chart should show only a limited amount of information. In the 
second place, although exact values can be given in a table, only approxi- 
mate values can ordinarily be shown by a chart. In a table we may 
enter as many digits as are desired, but we can plot only the approximate 
value on a chart. For example, while the data upon which Chart 4.2 is 
based could be recorded in a table in terms of bales, a chart could show 
only thousands, or at best hundreds, of bales. Thus charts are useful for 
giving a quick picture of a general situation, but not of details. In the 
third place, charts^ require a certain amount of time to construct, since 
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eaclj one is an original drawing. This difficult j"-, however, is offset by tfie 
added effectiveness which the chart possesses in comparison with a table.* 

TYPES OF CHARTS 

In this text we shall discuss: curves or Ime diagrams; bar charts, involving 
one-dimensional comparisons; area diagrams, involving two-dimensional 
comparisons (including particularly pie diagrams, which involve one- or 
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Chart 4.1. Axes for Curve Plotting. 


two-dimensional comparisons, or comparisons of angles) ; volume diagrmmj 
which call for a visualization of the third dimension and three-dimensional 
comparisons; pictographs, which involve aspects of both volume diagrams 
and bar charts; and statistical maps. Other specialized types of charts 
and certain charts .which are gi'aphic but not statistical (for example, 
organization and procedure chai'ts) are not treated here, but are discussed 


^William Playfair, who is understood to have invented outright’^ the graphic 
method in the latter part of the 18th century, says: ^^The advantage proposed by this 
method, is not that of giving a more accurate statement than by figures, but it is to 
give a more simple and permanent idea of the gradual progress and comparative 
amounts, at different periods, by presenting to the eye a figure [chart], the proportions 
of which correspond with the amount of the sums intended to be expressed.” See 
the article ^‘Playfair and His Charts,” by H. Gray Punkhauser and Helen M. Walker, 
in Economic Eistorg, February 1935, pp. 103-109. 
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in books on graphic methods. This chapter will consider only cujves 
using arithmetic scales. In the following chapter attention will be given 
to curves using a logarithmic vertical scale and an arithmetic horizontal 
scale. Chapter 6 will include brief discussions of bar charts ^ area ^ia- 
gramSj volume diagrams, pictographs, and statistical maps. 

PLOTTING A CURVE 

When statistical data are shown as curves, the points are plotted in 
reference to a pair of intersecting lines, called axes and shown in Chart 


MJLLIONS 
OF BALES 
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1925 1 929 1933 1937 1941 1945 1949 1953 


Chart 4.2. Production of Cotton in the United States, 1925-1953. Data 
from U. S. Department of Agriculture, Agricultural StaiisHcs^ 1952^ p. 76, and 1953, 
p* 64, and The Cotton Situation, issued by the Agricultural Marketing Service of the 
U. S. Department of Agriculture, May 27, 1954, p. 18. 

4.1. The horizontal line is known as the ^^Z-axis’^ and the vertical line is 
designated as the F-axis.’^ Positive values are shown to the right of 
zero on the X-axis and above the zero on the F-axis: negative values are 
placed to the left of zero on the X-axis and below the zero on the F-axis. 
The point at which the two axes intersecC is zero for both X and F and 
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is referred to as the “zero point,” the “point of origin,” or merely the 
“ori^n.” The positive and negative values on the axes increase as we 
move away from this origin. 

The two axes of Chart 4.1 divide the plotting area into four sections 
known as “quadrants.” For reference purposes, these quadrants are 
designated I, II, III, and IV. Quadrant I accommodates values which 
are positive on both the X- and F-axes. Quadrant II provides for values 

NUMBER OF 
OFTOMETRISTB 



Chart 44^ Net Income of 1,764 Optometrists in 1951. Data from the 
American Optometric Association. The frequencies for the last three plotted classes 
are estimates. 


which are negative on the X-axis and positive on the F-axis. Quadrant 
III takes care of values which are negative on both axes. Quadrant IV 
is for values which are positive on the X-ajds and negative on the F-axis. 

Any point plotted in one of the quadrants may be located by referring 
to its abscissa value, which is its horizontal or X distance from zero, and 
to its ordinate value, which is its vertical or F distance from zero. For 
illustrative purposes four points have been plotted on Chart 4.1, one in 
each quadrant: Pi represents X = -f-4, F == -f-2; P* indicates X = —3, 
F « +3; P> is X = -4, F ='-3; Pa shows X = +3, F « -2. 
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When the axes are used as bases of reference for plotting equations^ any 
or all of the quadrants may be used, since many equations may call for 
negative values of X or of F, or of both. At present, however, we are not 
interested in the graphic representation of equations, but in graphically 
portra 3 dng observed statistical data. When we are dealing with sta- 
tistical data, it must be obvious that both the X and F variables are 
ordinarily positive quantities, and that therefore we shall generally use 
only the quadrant designated as I, Chart 4.2, showing the production 
of cotton in the United States over a period of years, is an example ots. 
curve lying wholly in quadrant I. • 

Quadrants II and IV are occasionally used in conjunction with quad- 
rant I. Chart 4.3 shows a curve which makes use of quadrants I and II; 
the curve of Chart 4.4 lies partly in quadrant I and partly jn quadrant ¥V, 
Since both X and F values are negative in quadrant III, that quadrant 
is very rarely used. 

TYPES OF DATA SHOWN BY CURVES 

It was noted earlier that statistical data may be classified according 
to chronological, geographical, quantitative, or qualitative characteristics. 
Curves are frequently used for picturing time series and for showing fre- 
quency distributions (by far the most important sort of quantitatively 
classified data), although, of course, other types of graphs are also appli- 
cable as shown in the following chapters. Qualitatively and, especially, 
geographically classified data are rarely depicted by curves; instead, bar 
charts and other devices are used, as will be indicated hereafter. 

Time series curves. The method of plotting time series depends 
upon the type of data to be represented. We may distinguish between 
period data and point data. Period data, such as total sales per month, 
average monthly sales per year, and average prices during the year, refer 
to a period of time. Point data are those, such as inventory values, price 
quotations,. or temperature readings, which refer to a particular point of 
time. Whenever chronological data are depicted by means of a curve, 
the years, months, weeks, days, or other chronological units are shown on 
the horizontal axis; the other series, which varies with time, is placed on 
the vertical axis. 

Charts 4.2 and 4.22 show period data. When annual data of this type 
are plotted, the dates on the horizontal scales may be placed below the 
vertical lines, as in Chart 4.2, or below spaces, as in the left-hand part of 
Chart 4.22. Either method may be used; one argument for labeling the 
spaces is that this gives a visual impression of time as having duration. 
When monthly (and daily, weekly, or quarterly) data, are plotted for a 
number of years, there is no choice but*to label the spaces representing 
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each year, since, if the lines were labeled, it would not be immediately 
obvibus to all readers whether the label referred to the space preceding 
the line^ the space following the line, or possibly half of the space on 
each side. Each horizontal year-space is divided into 12 parts for the 
plotting of the monthly figures, and these figures may be plotted at the 
middle of each of the 12 spaces. Chart 4.4 illustrates 'this for period data 
on a monthly basis. 

excess OF ARRIVALS 
W£R OePARTUfieS 
IH^TMOU SANDS 



Chart 4.4. Net Arrivals and Departures of United States Citizens, January 
1947-Deceniber 1952. Data from U, S. Department of Commerce, Office of Business 
Economics, Business Statistics, 1951, p. 114, and 1953, p. 118. Beginning January 
1951, ail travel over international land borders was excluded from the figures of 
arrivals and departures; see note 4 on page 246 of Business Statistics, 1953. 

When point data are being represented by a curve, spaces, rather than 
lines, should be labeled on the horizontal axis and the observations should 
be plotted within the spaces at the point in time to which the data refer. 
This latter consideration is more important for annual data than for 
monthly data. However, for monthly data we should, ideally, (1) plot 
beginning-of-the-month data (such as figures of cold-storage holdings as 
of the first of each month) at the beginning of each space representing a 
month, (2) plot middle-of-the-month data (for example, payroll data for 
the payroll nearest the fifteenth of each month) at the middle of each 
space, and (3) plot end-of-the-month data (such as money in circulation 
at the end of each month) at the end of each space. This is illustrated 
in the three parts of Chart 4.6* If this procedure is not followed, the 
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appearance of a curve of monthly data is not altered ; the curve is merely 
shifted to the left or to the right. 


1 





1 

A, Beginning"Of“the- 
month data. 

B. Middle-of-the 
month data. 

C. End-of-the- 
month data.r 


Chart 4.5. Methods of Plotting Monthly Point Data. Each small chart 
represents the twelve months of a year. 


Curves of frequency distributions. The curve of Chart 4.3 is a 
graphic representation of a frequency distribution. Frequency distribu- 
tions will not usually continue into the second quadrant as does this one. 
In this instance, however, there were some negative incomes. 

Table 4.1 shows a frequency distribution- of the grades of the 1952 
graduating class of the United States Merchant Marine Academy. In 

TABLE 4.1 

Frequency Distribution of Grades Received 
for the Four-Year Course by 225 Cadet- 
Midshipmen of the 1952 Graduating 
Class of the United States 
Merchant Marine Academy 


Grade 

Number of cadet- 
midshipmen 

72.0-73.9 

7 

74.0-75.9 

31 

76.0-77.9 

42 

78.0-79.9 

54 

80.0-81.9 

33 

82.0-83.9 

24 

84.0-85 9 

22 

86 0-87 9 

8 

88 0-89.9 

4 

Total 

225 


Data from United States Merchant Marine 
Academy. 


order to show the genesis of the frequency distribution curve, the data 
are first represented by a series of rectangles or bars in the column 


2 Frequency distributions are discussed in Chapter 8. 
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diagram^’ of Chart 4.6. It will be noticed that the grades have been 
placed along the horizontal axis and the frequencies (number of cadet- 
midshipmen) along the vertical axis. There are as many columns in the 
chai;t as there were classes in the table, and the height of each column 
represents the frequency for the corresponding class. This column 
diagram is transformed into a curve by connecting the midpoint of the 
top of each rectangle with the midpoint of the top of each adjacen 


NUMBER OF 
OADET -MIDSHIPMEN 



GRADE 


Chart 4,6. Grades Received for the Four- Year Course hy 225 Cadet- 
Midshipmen of the 1952 Graduating Class of the United States Merchant 
Marine Academy, Shown by a Column Diagram and by a Frequency 
Curve. Data of Table 4.1, 

rectangle, as shown by the broken line in Chart 4.6, This is done upon 
the assumption that the values in a class interval are evenly distrib- 
uted throughout the class. The mid-value of a class is consequently 
taken as representing the class. ^ It will be observed that the dotted line 
cuts off some small triangular pieces of the original rectangles and that it 
also includes some small triangles not formerly included, but it is obvious 
that triangle A = triangle triangle B = triangle and so forth. 
Sometimes the curve is continued at each end to join the X-axis (indi- 
cating a frequency of zero) at the mid-value of the next possible class* 


» This polat as discussed at greater Ibngth in Chapter 9. 
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t’his procedure results in having the same area under the curve as is 
included in the rectangles. However, the result may sometimes he a 
curve which extends beyond zero on the X-axis, and this is a^>t to be 
meaningless. In any event the extensions suggest to the reader that 
items occurred beyond the limits of the observed data. Except for 
special purposes (see Chart 23.14), it is better not to extend the curve to 
the X-axis. The frequency distribution may be shown either as a column 
diagram or as a frequency curve (frequency polygon). The latter is 

MUMBER or 

CADET -MIDSHIPMEN 



70 72 74 76 76 80 82 84 86 88 90 92 


GRADE 

Chart 4,7. Grades Received for the Four- Year Course by 225 Cadet- 
Midshipmen of the 1952 Graduating Class of the United States Mercbant 
Marine Acsademy. Data of Table 4.1 

more usual and the curve is plotted directly, as in Chart 4.7, without the 
intermediate step of constructing columns. 

Sometimes frequency distributions are encountered which refer to such 
information as number of children in a family, number of automobiles 
parked in a block, or other data which can have only values that are 
integers (0, 1, 2, 3, etc.). Frequency distributions dealing with variables 
of this sort, which we shall identify in Chapter 8 as discrete/^ are 
generally shown by a column diagram, rather than by a curve. Chart 
23.12, showing the data of Table 23.7, illustrates this point; the sepa- 
ration of the bars serves to emphasize the lack of continuity which is 
present. 
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RULES FOR DRAWING CURVES 

While ^statisticians have not agreed upon a standard procedure setting 
forth in detail exactly how line diagrams should be constructed, there are 
certain rather obvious considerations of importance. The student who 
is interested in going into more detail in regard to the technique of chart 
construction is I'eferred to a book dealing solely with that topic.^ 

Zero on vertical scale. The inclusion of a zero on the vertical scale 
of a curve is perhaps one of the most important rules. Chart makers 

MJLLICNS 
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Chart 4.8. Production of Cotton in the United States, 1923-1953. This 
chart is incorrectly drawn, since the vertical scale begins with 8 and there is no clear 
indication of the omission of the zero. Data from sources given below Chart 4.2. 

all too frequently neglect to observe this principle and the result is always 
misleading, since the visual impression is incorrect. In Chart 4.2 the 
production of cotton in the United States from 1924 to 1952 was plotted 
with reference to a vertical scale beginning with zero. The same series 
of data appear in Chart 4.8, but on this chart the vertical scale begins at 
8,000,000 bales. Chart 4.8 gives the reader a visual impression which is 
quite contrary to the facts. For example, production in 1949 appears to 
have been about 10 times that for 1946, whereas Chart 4.2 shows clearly 
that 1949 production was only about twice as large as 1946 production. 
Very few readers notice the omission of zero on a vertical scale, and fewer 
still are apt to make due allowance for the omission in interpreting a 

^ For example, .Mary E, Spear, Charting Statistics, McGraw-Hill Book Co., Inc., 
New York, 195Z Also W. 0. Brinton, Graphic Presentation, Brinton Associates, 
New York, 1939. 
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curve, It should not be necessary for a. reader to refer to a scale in order 
to make approximate comparisons; the chart should be so drawmthat 
visual comparisons may be made as quickly as possible. 

Showing the aero as in Chart 4.2 would sometimes result in placing the 
curve high up on the grid and might also make the movements of the curve 
difficult to discern. Therefore, the omission of the zero on the vertical 
scale of a chart usually occurs because the person constructing the chart 

MILLIONS 
or BALES 
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Chart 4.9# Production of Cotton in the United States, 1925-1953. Data from 


sources given below Chart 4.2. 

wishes to emphasize the movements of the curve and feels that the space 
between the curve and the X-axis is useless. There are several ways in 
which it is possible to show the zero (or to indicate clearly its omission) , 
and also to avoid placing the curve high up on the chart. Chart 4.9 
shows a method in which a definite break is made across the chart. Some- 
times the parallel lines are serrated (notched) instead of wavy. They 
may be drawn freehand or, as in Chart 4.9, by making use of a bread 
knife as a ruler. Charts 4. 10, 4.11, and 4, 19 show other devices which are 
occasionally used. Notice that Charts 4.9 and 4.19 show the zero and a 
scale break, while Charts 4.10 and 4.11 do not show the zero but merely 
call attention to the fact that the vertical scale is incomplete. 

Chart 4.12 appeared in the annual report of a large corporation. 
Because no warning is given of the omission of the zero on the vertical 
scale, this chart gives a misleading visual impression of the decrease in 
bonds and notes outstanding. Unless the vertical scale is consulted, the 
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Chart 4.10. Production of Cotton in the United States, 1925-1953. Data 
from sources given below Chart 4,2. 


MliUONS 
or BALES 



Chart 4.11. Production of Cotton in the United States, 1925-19SS. Data 
from sources given below Chart 4,2. 


reader may conclude that outstanding bonds and notes have been nearly 
eliminated. 

Occasionally curves will be seen which lack a zero on the vertical scale 
and which show the growth of sales of a commodity, membership in an 
organization, circulation of a pmodical, or other data. The omission 
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Mitfbns of Dollors 



Chart 4.12* Outstanding Bonds and Notes of a Large Corporation, 1941-1 9S2, 

From an annual report. 


INDEX NUMSER 



Chart 4.13. Constimers Price Index of Food in the United States, 1935-1953. 
1947-1949 = 100. Bata from Monthly Labor Repiew, February 1954, p, 236. 

of the zero makes the growth appear to be much more rapid than it really 
has been. 

Chart 4,13 shows index numbers of the retail prices of food. This 
chart is unusual in two respects. In the first placOy it carries a zero for 
the Yertical scale^ which, though not ‘wrpng, is not necessary when price 
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in'Hex numbers are being plotted, because it is hardly conceivable thiit 
prices will ever approach zero and because 100 is the base of the index 
number. The 100 line should always be emphasized when it is the base, 
as in this chart. Similarly the zero line should be emphasized, as in 
Chart 4.9, when it is the base of the chart. When charting index numbers, 
some persons prefer to show the fluctuations above and below 100 in 
terms of positive and negative values. In the case of Chart 4.13, 100 
would become zero, 120 would become +20, and 75 would become —26. 
The vertical scale of Chart 4.13 would be altered to read +20, 0, —20, 
— 40, —60, —80 and —100. The curve itself would remain unchanged. 
The second unusual feature of Chart 4.13 is the treatment of the hori- 
zontal and vertical guide lines, which results in giving the curve an 
unusually clear profile. Notice also that space has been left to add later 
data. This practice allows the same original chart to be reproduced 
time after time by merely extending the curve as new data become 
available. 

Ruling curves* The curve or curves representing the data should 
stand out clearly from the background of the chart. The curve should 
therefore be ruled more heavily than the coordinates. (When two or 
more curves are shown which follow each other closely or which inter- 
twine, it is sometimes necessary to use more lightly ruled lines for some of 
the curves. See, for example, Chart 17.3.) As will be seen from the 
various curves in this text, the plotted points are not usually shown, since 
the attempt is to present the general situation rather than the individual 
readings. 

When several curves are drawn on the same axis, it is important for the 
reader to be able to identify each curve. Thus we may use solid, dotted, 
and dashed lines, and we may use heavy and light lines. If a light line is 
used for a curve, it should ordinarily not be so light as the coordinates. 
The suggested rulings are listed below as A and B. 


A 


B 


C 


A. Tliese lines 
are recommended if 
not more than three 
curves are to be 
drawn. 


B. If more than 
three curves are to 
be drawn, these 
lighter lines may he 
used. 


C. These lines 
are not recom- 
mended unless plot- 
ted points are to he 
indicated by means 
of the circles or 
dots. 
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• When two or more curves appear on a chart, each should be clearly 
identified. This may be accomplished by labeling the curves as in Charts 
4.16. 4.21, and 17.3. 

It is ordinarily well to avoid the use of more than two or three curves 
on one chart. Particularly if they cross and re-cross, confusion is likely 
to result. When several curves appear on a large wall chart which is to be 
presented to a group, different colors may occasionally be used, though it 
is usually better practice to reserve the use of color for those occasions 
when special emphasis is to be placed on one or two curves. Black, red, 
green, light or medium blue, and medium or dark orange are rea^ify 
distinguished. If there is a likelihood that the wall chart is to be photo- 
stated, photographed, or reproduced for printing, black and red may be 
used in solid and broken, light and heavy, combinations, since the red liiie 
will reproduce as black. Blue, yellow, and some shades of green photo- 
graph either not at all or faintly. Color is ordinarily too expensive to be 
used in a book. 

Coordinates. Chart makers emphasize the zero line by making it a 
little heavier than the other marginal lines. In similar fashion, a 100 per 
cent line (or other base with which comparisons are made) may be 
stressed. The marginal vertical and horizontal lines may be made slightly 
heavier than the other coordinate lines. 

The coordinate lines should be drawn very lightly. N o more coordi- 
nate lines should appear than are necessary to assist in reading the chart. 
Occasionally all coordinates are omitted, as in Chart 4.4, which uses 
“tics” in lieu of coordinate lines. If it is desired to have a closely ruled 
grid in order to make plotting easy, the chart may be drawn on tracing 
cloth or tracing paper which has been placed over a grid which has the 
desired closely spaced coordinate lines. Alternatively, when a chart is to 
be reproduced, a closely ruled grid of light blue may be used. The lines 
which should appear in the reproduction are ruled in black. The blue 
lines of the background do not show up in the reproduction under ordinary 
conditions. Some of the charts in this text were drawn on such a light 
blue background. 

In order to insure a proper understanding of a chart, the two scales 
should be clearly labeled. Not only should the nature of the data be 
indicated, but the units used should also be stated. For example, in 
Chart 4.3 the horizontal axis shows incomes, the unit being thousands of 
dollars. Occasionally a curve of a long time series may be rather extended ' 
horizontally. In such instances it is sometimes desirable to repeat the 
vertical scale at the right of the chart. 

Chart proportions. It is hardly possible to give an objective rule as 
to the proper proportions for a curve diagram. It should be noted, how- 
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*25 '29 ^33 * 3 ? * 4 \ '45 '49 *53 

Ck^iirt 44l4« Producstioti of 
Cotton in the United States^ 1923- 
195 S. The vertical dimeHsioH is ex- 
aggerated ia relation to the horizontal 
dimension. Rata from sources given 
Wow Chart 4.2, ' 


ever, that bizarre impressions result 
from over-expanding or over-contract- 
ing either scale used for a curve. In 
Chart 4.14 the vertical scale is exag- 
gerated in relation to the horizontal 
scale; in Chart 4,15 the horizontal scale 
is exaggerated. The former gives an 
impression of tremendous iuctuations; 
the latter conveys the idea that cotton 
production has undergone relatively 
unimportant fluctuations. These two 
charts indicate distorted results of re- 
plotting the data shown properly in 
Chart 4.2. Rules of thumb are often 
unsatisfactory because they are apt to 
be adopted blindly. However, it has 
been suggested that the proper pro- 
portions are those which result in a 
45-degree angle for the movements of 
the curve which are to be emphasized. 

Just as it is possible to overempha- 
size or to minimize fluctuations by poor 
choice of scales, so it is possible to create 
misleading impressions in regard to 
growth. One curve of Chart 5,3 show'^s 
automobile registrations in the United 
States for 1917-1953. Expanding the 
vertical scale and contracting the hori- 
zontal scale would give a visual impres- 
sion of very rapid growth of United 
States automobile registrations; con- 
tracting the vertical scale and expand- 
ing the horizontal scale would make 
the growth appear to have been very 
slow. 

Although the two preceding para- 
graphs referred to curves of time series, 
it should be understood that misleading 
visual impressions may be given by 
curves of frequency distributions, and 
by virtually any other type of chart, 
if one scale is over-expanded or is 
unduly contracted in relation to the 
other scale. 
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Lettering. All lettering on a chart, including scale labels, scale 
values, legend, ctirve labels, and any other words or figures, should be 
placed horizontally, if possible. Occasionally space limitations may 
necessitate placing the vertical scale label in a vertical position (which 
may be the reason it was so placed in Chart 6.3), but such a limitation is 


THOU5AMOS 

or rcTOON* 



Chart. 4.16. Arrirals and Departures of United States CitisGens, January 
1947-Oeccmber 1952. For source of data, see Chart 4.4. The hatched areas repre- 
sent excess of arrivals over departures; the stippled areas show excess of departures 
over arrivals. 

not often present. Needless to say, all lettering should be legible. 
Freehand words and figures may be made very attractive when executed 
by a skilled person. The amateur may, however, make excellent formal 
letters and figures with a little practice by the use of stencil lettering 
devices available from artists’ or draftsmen’s supply houses. Nearly all 
of the charts in this text, except those reproduced from other publications, 
were lettered by means of such devices. The lettering inside of the border 
of Chart 17.1, and some of the other inserts on charts elsewhere in this 
book, was done by use of a typewriter having block type. 

Title. Each chart, like each table, should have a title, which should 
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state clearly and succinctly what the chart purports to show. The, title 
of a printed chart may appear either above or below the chart, but 
preferably below. The titles of large wall charts are often placed above 
the grid or, sometimes, upon it. 

Source. Again, as in the case of a table, each chart should contain a 
source reference to indicate the author, title, volume, page, publisher, and 
date of the publication from which the data were taken. Naturally the 
cautions regarding comparability of data ta^ken from the same source or 
different sources, mentioned in Chapter 2, apply with full force to the 
figures used for making charts. 

LINE DIAGRAMS FOR SPECIAL PURPOSES 

Net balance charts. Chart 4.4 shows one method cff indicating the 
net total of two series. For each month, departures were subtracted 
from arrivals and the result plotted as a positive or negative figure. The 
balance of trade (value of exports minus value of imports) may be shown 
in the same manner, as may also profit and loss. An alternative method 
of showing the arrival and departure data is illustrated in Chart 4.16. 
Here* the curves for arrivals and for departures are given; excess of 
arrivals is indicated by the height of the cross-hatched area, while the 
excess of departures is shown by the height of the stippled portion. 

Silhouette charts. Chart 4. 16 (referred to in the preceding paragraph) 
illustrates not only the showing of net amounts rather than gross amounts, 
but likewise the practice of shading the area between two curves in order 
to obtain emphasis. Chart 4.17 is similar to Chart 4.4 in that it shows 
fluctuations above and below a base line. In Chart 4.17, however, the 
areas of the curve have been emphasized by filling in with black. The 
result is a more striking portrayal of the ^^plus^^ and ^^rninus^^ parts of the 
curve. A chart of this type is even more effective when the **plus^’ areas 
are filled in with black and the “ minus areas are filled in with red. 

Maximum variation charts. The Library of Columbia University 
displayed in an illuminated glass case a number of valuable old prints. 
For the proper preservation of the prints it was desired to maintain the 
temperature between 70 and 80 degrees Fahrenheit. The problem con- 
sisted of adjusting radiation of heat from the case, ventilation and con^ 
duction, and the proximity to nearby radiators so that the temperature 
inside the case would remain within the desired limits. A recording 
thermometer was placed in the case and the temperature was continuously 
recorded over an extended period. In Chart 4.18 a four-day section of 
one of the charts is shown. During these days there was no heat in the 
adjoining radiator, and it may be seen that the temperature never fell 
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below 70 degrees but did slightly exceed 80 degrees on several occasions. 
On Thursday, Friday, and Saturday the library was open to the public 
from 8 a.m. to 10 p.m.; on Sunday, from 2 to 6 p.m. 'The dashed lines 
have been added by the authors and serve to stress the limits beyond 
which the temperature should not fluctuate. 



Cliart 4.18. Tem|ieraturc fliictiiatioiis ia a Library Display Case. 
Temperature is in degrees Falirenheit. The curved ordinates are made to 
correspond to the arc described by the recording pea of the thermometer.^ 
(From the Library of Columbia University.) 

Bamge charts. Chart 4.19 shows a device by means of wMch the 
range of stock prices may be depicted. It will be noticed that the black 
band expands when the range is greater and contracts when the range is 
smaller. The white line indicates the closing price. An alternative 
method of showing the same data is illustrated in Chart 4.20. Here the 
top of each bar represents the high for the day, while the bottom of each 
bar represents the low for the day. The line connecting the bam repre- 
sents the closing price. Charts such as these may be used for showing 
commodity prices and other sorts of data if it is desired to show a range of 
variation over a period of time, 

Z-charts. The 2-chart consists of three curves on the same axes as 
shown in Chart 4.21. Usually the chart covers a period of one year, by 
months. One curve shows the monthly figures, another shows the 
cumulative figures from the beginning of the year, while the third shows 
the total for the twelve months ending with each month. This last curve 
is generally called the moving annual total curve; more specifically, it is a 
12-month moving tote! for the twelve months ending with each designated 
month. Two vertical scales are used with the 2-cliart, since, if the 
monthly data were plotted against the same scale s» the other data, thc' 
fluctuations' of the monthly data wouldT not be 'apparent. The 2-chart 
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200 



JANUARy FEBRUARY 


Chart 4.19. High, Low, and Closing Prices of ISO Stocks as Shown by the 
New York Times Averages, January 5-February 27, 195S. Data from varioiin 
issues of the New York Times, 


DOULARS 



Chart 4.20. High, Low, and Closing Prices of 50 Stocks as Shown by the 
New York Times Averages, January 5- February 2t, 1953, Bata from yarious 
Wiea of the New York Tirnes, 
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is often used for internal business purposes, showing, for example, data 
of production and sales. It is, of course, limited to those situations in 
which the chart maker is interested in visualizing: (1) the figure for a 
given month, (2) the figure for each month for that part of the calendar 
(or fiscal) year which has elapsed, and (3) the figure for the twelve months 
ending with each given month. 

Except for special purposes such as this, it is notr usually desirable to 
use two, or more, vertical scales (sometimes referred to as “multiple 


MOVING TOTAL 



Chart 4.21. Sales of Sears Roebuck and Company; Montlily, Cumulative, 
and Moving Annual Total, 1953. Data from Survey ef Current Business^ February 
1953, p. S-10, md February 1954, p. S-ID. 

scales”) on a chart of the type described in this chapter. The occur- 
rences of fluctuations (but not their magnitudes) in two series expressed 
in different units may occasionally be compared on a chart having two 
different vertical scales. However, the use of two, or more, different 
vertical scales is likely to give false visual impressions of the comparative 
magnitudes of changes occurring in the yarious series. 

Varying horizontal-scale charts. Occasionally it is desired to show 
annual data over a number of years, and monthly data for one or two 
more recent years. This may be done as in Chart 4.22, in which the hori- 
zontal scale is expanded to show the monthly data in more detail. N otice 
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that the two parts of the chart are separated by a break. Similarly, a 
change in horizontal scale may be in order if we wish to show a combina- 
tion of "annual or monthly data with weekly data, or a combination of 
annual, monthly, or weekly data with daily data. 



Chart 4.22. Index of Ordinary Life Insurance Sales in 
New Jersey, Annually, 1937-1946, and Monthly, 1947-1953. 
Reproduced from Review of New Jersey Bminesa, January 1954, 
p. 17. 

Multiple-axis charts. Occasionally it is desirable to compare the 
fluctuations of several curves and yet to have each curve stand out 
clearly. A simple method of accomplishing this result is to plot the differ- 
ent curves along different horizontal axes, these different X-axes being 
arbitrarily separated by convenient vertical distances. An illustration 
is Chart' 14.5, which is also referred to as a year-over-year chart. 
Here the different curves have been brought close together for ease of 
comparison, but there is no crossing of the lines. Although different 
hodzontal axes are employed, the vertical and horizontal scales remain 
the same. In interpreting such a chart on arithmetic graph paper (as 
distinguished from semi-logarithmic graph paper described in the follow- 
ing chapter), it should be^ remembered that the comparison afforded is 
that of absolute and not of relative changes. It is unlikely that the use 
of this type of chart will be found desirable for presentation to the general 
reader, unless the diagram is accompanied by a clear explanation. 

Component-part charts. Chart 4.23 shows the number of pemons 
in the United States' at each census from 1850 to 1960, in each of four 
general age groups. The height of each band indicates the number of 
each age in the country ' at a pvdh census. It is possible to observe^ from 
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MiLLlOMS 
OF PERSONS 



Chart 4#23. Population of the United States in Each Specided Age Groiip% 
1850-1950. Data from U. S. Bureau of the CSensus, Fifteenth Census of ihe United 
States f iBSOj Population Volume II, p. 576, and U, S, Census of PopulaMony 1950^ 
Vol. II, Characteristics of the Popidationf Part I, United States Summary, p. 1-93. 


PER CENT 

lOOj 



Chart 4.24. Froportiuwi of 'the Population of the United Stales it* Eaeli 
Speeiicd A|te Croup, 1850-1950. Data from sources given below Chart 4.2S. 


92 


GRAPHIC PRESENTATION I 


[Chap. 4 


this type of chart, whether or not a given group is increasing or decreasing, 
and whether or not the total of all groups is increasing or decreasing. 
The relciive importance of a particular group cannot be visualized from 
Chart 4.23, but in Chart 4.24 the age groups are shown according to the 
proportions which they constitute of the total population. Here it may 
be clearly seen that there has been a decrease in the proportion of younger 
persons and an increase in the proportion of older persons in the popula- 
tion. When component-part data covering a few years are to be shown 
graphically, a bar chart such as the upper part of Chart 6.17 or 6.18 may 
be used. When a number of years are to be shown, the general trend can 
be more easily pictured by curves. 

Frequency distribution and range chart. Sometimes it is advan- 
tfiCgeous to show a frequency distribution curve for one set of data and to 

NUMBER OF WOMEN 

Ptn $Z,&0 OF EARNiNCS 



DOtXARS 


Chart 4.25* Weekly Earnings of 14,817 Female Secretaries in Non-Manu- 
facturing Industries in New York City and Range of Pay for Female Secre- 
taries in a Non-Commercial Organization, January 1952. The data of weekly 
earnings in New York City are from Table 8.5 and are frequency densities/^ which 
are explained in the discussion concerning Chart 8.5. 

compare with that curve the range of values for another distribution. 
Chart 4,25 shows a frequency distribution of the average straight-time 
weekly earnings of 14,817 female secretaries in non-manufacturing 
industries in New York City in January 1952. A non-commercial organ- 
ization was interested in knowing how its secretarial salaries compared 
with these and showed the rangfe of its own salaries as indicated on the 
chart. Alternatively, two frequency distributions could have been shown, 
as in Chart 8.7. 
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THE SEMI-LOGARITHMIC OR RATIO CHART 


AMOUNT OF CHANGE VS. RATIO OF CHANGE 

When considering the development of a series of statistical data over a 
period of time, we are sometimes interested in the amount of change that 
has taken place, but more often we wish to know something about the 
ratio of change that has occurred between two dates. Diagrams such as 
Charts 4.2, 4.4, and various others in Chapter 4 are of the familiar type, 
having what are termed arithmetic scales^ and are of use, primarily, for 
indicating absolute changes in the factor shown on the F-axis. It is the 
purpose of this discussion to explain a slightly different sort of grid which 
enables one to visualize the ratio of change in a plotted series. 

TABLE 5.1 

An Arithmetic Progression 


Year 

(X value) 

Y value 

Amount 
of increase 

1946 

0 


1947 

200 

200 

1948 1 

400 

200 

1949 ! 

600 

200 

1960 1 

800 

200 

1951 

1,000 

200 

1952 ' 

1,200 

200 

1953 

1,400 

200 


The ability of the usual type of chart to give a satisfactory visual 
impression of absolute change, but not of ratio of change, is brought out 
by Chart 5.L Curve A represents a constant amount of increase of 200 
units per year (see Table 5.1), and this, or any other, arithmetic progression 
(constant amount of increase or decrease) will be depicted by a straight 
line when plotted on the conventional ^or arithmetic grid. Curve B, 

m 
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1946 1947 1948 1949 1950 1951 1952 1953 


Chart 5.1. An Arithmetic Progression (A) and a Geometric Progression (B) 
Plotted on an Arithmetic Grid. Data of Tables 5.1 and 5.2. 

however, is the result of plotting a series of figures which begin with 128 
and increase 50 per cent each year (see Table 5.2). It will be noticed 
that this curve is not a straight line'; the curve bends upward more and 
more sharply as time passes. 


TABLE 5.2 

A Geometric Progression 


Year ! 

(X value) 

Y value 

Per cent 
of increase 

1946 

128 


1947 

192 

m 

1948 

2S8 

m 

1949 

432 

50 

1950 

848 

50 

1951 

972 

50 

1962 

1,458 

60 

1953 

2,187 

50 


A series showing a constant ratio of increase or decrease is known as a 

geometric progression, and any geometric progression will yield a curved 
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liije when plotted on an arithmetic grid.^ An increasing geometric pro- 
gression is represented by a curve which slopes upward and is concave 
upward, as in Curve B of Chart 5.1 ; a decreasing geometric progression is 
represented by a curve which slopes downward and is concave fipward. 
A serious difficulty in interpreting such curves, however, lies in the fact 
that the eye cannot discern whether or not a particular curved line does 
or does not represent a constant ratio of change. Chart 5.2 depicts a 
series which is neither an arithmetic nor a geometric progression. The 
data of Table 5.3 show that the series increases more rapidly than an 

TABLE 5.3 

A Series of Increasing Values 


Year 

(X value) 

Y value 

Amount 
of increase 

Per cent 
of increase ^ 

1946 

50 



1947 

80 

30 

60.0 

1948 

160 

80 

100.0 

1949 

300 

140 

87.5 

1950 

550 

250 

83.3 

1951 

1 ,080 

530 

96.4 

1952 1 

1,730 

650 

60.2 

1953 ! 

2,500 

770 

44.5 


arithmetic progression, and the eye can grasp this fact because the curve 
bends upward. The table also indicates that the ratio of increase of the 
series is not constant. Visually, however, this fact is not apparent. It 
is not possible for the reader of an arithmetic chart to be sure whether 
a curved line, such as this, represents a constant ratio of increase, a ratio 
of increase which is diminishing, or a ratio of increase wffiich is accelerat- 
ing. Any series of figures that increases more rapidly than an arithmetic 
progression (for example, 10, 12, 15, 19, 24, 30) slopes upward and is 
concave upward when plotted on an arithmetic grid; any series of figures 
that decreases less rapidly than an arithmetic progression (for example. 
100, 91, 83, 76, 70, 65) slopes downward and is concave upward when 
shown on arithmetic coordinates. 

Before proceeding to develop the basis for the semi-logarithmic or ratio 
grid, which will enable us to visualize ratios of change, let us examine fur- 
ther the arithmetic grid. Chart 5.3 shows the growth of motor vehicle 
registrations in the United States and in Canada from 1917 to 1953. We 


^ A curve representing a geometric progression is termed an ^Exponential curve 
and is indicated by the equation Y ^ The reader may be familiar with this 
equation in the form Pn = -P«(l + r)% which is the compound interest equation and 
Is discussed in Chapter % A straight line representing an arithmetic progression is 
indicated by F * 
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can see from this chart that registrations in the United States increased 
rapidly and, apparently, in approximately an arithmetic progression from 
1917 to 1929; held fairly constant from 1929 to 1930; dropped in 1931, 
1932, and 1933; and resumed the upward movement from 1934 to 1937, 
onlj' to fall slightly in 1938. They rose from 1938 to 1941, fell from 1941 
to 1944, and increased from 1945 to 1953, showing approximately an 
arithmetic progression from 1945 to 1951. Changes in registration in 



1946 1947 1948 1949 1950 1951 1952 1953 

Chart $.2. A Series of Figures Increasing by Increasing Amounts. This 
series is not a geometric progression, but may give that visual impression. Data of 
Table 5.3. 

Canada are difficult to see because the scale which must be used to accom- 
modate the United States causes the curve for Canada to fall rather close 
to the base line. However, it appears that registrations in Canada 
increased from 1917 to 1930; decreased in 1931, 1932, and 1933; increased 
again to 1941; declined very slightly for 4 years; and increased thereafter. 
It is quite obvious that the amounts of increase and decrease each year 
were greater for the United States than for Canada, but there is no way 
of knowing from the appearance of the curves which country had the 
greater ratios of increase or decrease from year to year. 

It would not do to replot the data of Chart 5.3 by using one vertical 
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Chart 5.3. Motor Vehicle Hegistrations in the Unitcsd Stales 
and Canada, 1917-1953. Data from Automobile Manufacturers 
Association, AutomoUU Facts andFigureSt 1953, p. 24; The Canada Year 
Book, 1937, p. 668, 194S-49, p, 707, 1954, p. 811 ; Table MV4, 1953, of 
Motor-Vekide Begistrations'-^ISSB” issued by the Bureau of 
Public Eoads; and by correspondence from the Dominion Bureau of 
Statistics. 
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scale for the United States and another for Canada, in order to magnify 
the movements of the curve for the latter. The fact that one curve is 
below another on an arithmetic grid tells us at a glance that the lower 
curve represents a series of smaller magnitude than does the upper. If 
two vertical scales are used, we have really two distinct, non-comparable 
charts, and no satisfactory visual comparisons may be made in respect to 
(!) the size of the two series plotted, (2) the amount of change which has 
taken place in one series in comparison with the amount of change in the 
other, or (3) the ratios of change of the two series. 

A GRID TO SHOW RATIOS OF CHANGE 

From what has already been said it must be obvious that graphic com- 
parisons in respect to ratios of change will be facilitated if we can employ 

LOGARITHM 



Chart 5*4. Logarithms of a Geometric Progression Plotted on an 
Arithmetic Grid. Bata of Table 5.4. 

a sort of grid which will make a constant ratio of increase (or decrease) 
appear as a straight line. In Table 5.4 the geometric progression of 
Table 5.2 and Chart 5.1 is again shown^ and with it are given the loga- 
rithms of the various numbers. Examination of these logarithms reveals 
that they form an arithmetic "progression; therefore, if these logarithms 
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TABLE 5.4 


A Geometric Progression and Logarithms of tfie 
Geometric Progression 


Year 
(Z value) 

Y value 

Logarithm 

of 

Y value 

Amount of 
increase of 
logarithms 

1946 

128 

2.107210 


1947 

192 

2.283301 

. 176091 

1948 

288 

2.469392 

. 176091 

1949 

432 

2.635484 

.176092* 

1950 

648 

2.811575 

. 176091 

1951 

972 

2.987666 

. 176091 

1952 

1,458 

3.163758 

. 176092* 

1953 

2,187 

3.339849 

. 176091 


♦ These values differ slightly because the logarithms were* 
rounded to the nearest nuilionth. 



1946 1947 1948 1949 1950 1951 1952 1953 


Chart 5.S. A Geometric Progression Plotted on a Semi-Logaritlmiic or 
Eatio Grid. Data of Table 6.2. Printed semi-logarithmic forms have more inter- 
mediate miings than shown in this chart. These closely spaced lines are an aid to 
plotting but are omitted from most of the charts in this book, since reduction to bt 
the size of the page wonid result in bringing these lines very close together. The 
detailed ruling is shown in Chart. 6.18. 
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a^-e plotted on an arithmetic grid, a straight line will result, as may be 
seen in Chart 5.4. This is one way of accomplishing our objective, but 
it involves the additional step of looking up logarithms before the data 
can be plotted. However, instead of plotting the logarithms of the values 
of a series, we may use a grid which is designed with a logarithmic 
vertical scale, as in Chart 5.5. Here, again, we find that the geometric 
progression appears as a straight line. A grid of 
this type is termed semi-logarithmic because one 
scale is logarithmic and the other is arithmetic. 

The logarithmic scale. The construction of 
the logarithmic scale merely involves spacing the 
vertical-scale values in proportion to the differences 
between their logarithms. Referring to Chart 5.6, 
it will be found that the distance from 2 to 3 on 
the scale is 0.352 inch, and from 3 to 4 is 0.250 inch. 
We then have: 

log 3 - log 2 ^ 0.352 inch 
log 4 — log 3 0.250 inch 

0.477 - 0.301 0.352 inch 

0.602 - 0.477 ” 0.250 inch’ 

and the proportion is: 

0.176:0.125: : 0.352 inch: 0.250 inch. 

An alternative approach to an understanding of 
the logarithmic scale does not involve logarithms. 
Reference to Chart 5.1 will recall that equal dis- 
tances on the vertical scale of an arithmetic grid 
represent equal amounts. Equal distances measured along a logarithmic 
scale, however, represent equal ratios. On the vertical scale of Chart 5.5 
it may be seen that the distance from 100 to 200 is 0.48 inch; likewise the 
distance from 300 to 600 is 0.48 inch. Measurement will reveal that any 
two numbers of ratio 1 : 2 are separated by 0.48 inch on this scale. On 
this same scale the distance from 200 to 800 is 0.96 inch, and it follows 
that any two numbers of ratio 1:4 will be separated by 0.96 inch. Thus 
we see why the .semi-logarithmic chart is frequently termed the ratio chart. 

The vertical scale of Chart 5.5 is divided into two parts which are 
generally called cycles. We therefore refer to the paper on which Chart 
5.5 was drawn as “two-cycle semi-logarithmic paper.” In labeling the 
vertical scale of a semi-logaiithmic chart, we may begin with any positive 
value. The figure at the top of the first cycle will be ten times that at 
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1.000 

.954 

.903 

.845 

.778 

.699 

.602 


.477 


. 194 * 


. 602 * 


Chart 5.6. The 
Logarithmic Scale. 
The vertical distances 
are proportional to the 
differences between the 
logarithms. Each ver- 
tical distance is twice 
the difference between 
the logarithms meas- 
ured in inches. 



Chap. 51 THE SEMI-LOGARITHMIC OR RATIO CHART 101 


too 

rr- ‘.000 

p- 2.000 

cr 5.000^ 


so 

- 800 

I ‘ ’^00 

- 4.0001 

E 

60 

- 600 

- J,200 

- 3.000 

E 

40 

400 

800 

2,000 



- 

“ 

1 

1 

20 

200 

400 

1,000 

- 

! 0 

— JOO 

Z~ 200 

p- 500 

, 

a 

“ 80 

" 160 

- 400 

E 

6 

60 

120 

- 300 

z 



•• 


m. 

4 

40 

““ 80 

o 

o 



- 

- 

- 

- 

2 

20 

40 

100 


t.o 

— 10 

r- 20 

— 50 


8 

- 8 

- 16 

- 40 

E 

.6 

6 

12 

- 30 

i— 


"" 

— 


h* 

4; 

- 4 

8 

20 



“ 

- 

- 

r 

2 

2 

- 4 

to 

L- 

.1 

1 

-- 2 

— 5 

t 

L- 


10.000 

cr- 17.000 

cr 25.000 

pr- 50,000 

8,000 

:: 13,600 

- 20.000 

F 40,000 

6,000 

- 10.200 

~ 15,000 

F 30.000 

4.000 

6.800 

10.000 

20,000 

r 

2,000 

^ 3.400 

5.000 

E 10.000 

1.000 

r- t,7oo 

r 2,600 

5,000 

800 

: 1,360 

I 2.000 

- 4.000 

600 

- 1.020 

1,500 

- 3,000 

400 

680 

1.000 

2,000 

200 

j 

340 

500 

1,000 

100^ 

=— 170 

r- 250 

— 500 

80; 

Z * 136 

- 200 

5 400 

60: 

I 102 

- 160 

- 300 

40 

88 

100 

200 

20 

- 34 

- 50 

100 

10 

— 

— 25’ 

— 50 


Chart 5.7. Logarithmic Vertical Seales. The scale beginning with 17 
would be difficult to use. 
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tfie bottom of the cycle; the figure at the top of the second cycle will be 
ten-times the figure at the bottom of the second cycle (the top of the first 
cycle); and so on.® In Chart 5.7 there are illustrated eight different 
logarithmic scales beginning with 0.1, 1, 2, 5, 10, 17, 25, and 50, respec- 
tively. Mthough it is mathematically permissible to begin a logarithmic 
scale with any positive value, it is advisable to select a scale which will 
allow interpolations of intermediate values to be made readily. The 
scale beginning with 17 would be very difficult to use. If it were desired 
to have a three-cycle scale beginning with 0.5, the various values of the 
first scale could be multiplied by 5. Most ready-ruled semi-logarithmic 
paper carries along the right edge of the grid such designations as those 
shown in Chart 5.18. These are multiplying factors and indicate that 
the value to be written opposite each horizontal line on the left scale 
must be the value at the bottom of that cycle multiplied by the figure 
shown opposite that horizontal line on the scale at the right. 

If a logarithmic scale were begun with zero, the top of the first cycle 
would be 10 X 0 - 0, and all values on the scale would also be zefo. 
Suppose that the uppermost value of a three-cycle logarithmic scale is 
0.01. Then the bottom of the third cycle is iV of 0.01, or 0.001 ; the bot- 
tom of the second cycle is 0.0001; and the bottom of the first cycle is 
0^0001. There can thus be no zero base line, and the semi-logarithmic 
chart does not permit interpretation of curves in terms of distances above 
a base line as does the arithmetic chart. Although plotted values may, 
of course, be read against the vertical logarithmic scale, no visual impres- 
sion may be had of the absolute magnitudes plotted. The semi-logarith- 
mic chart shows: (1) a constant ratio of change as a straight line; (2) the 
ratio of increase or decrease by the slope of the line; and (3) the compari- 
son of ratios between two or more lines by means of parallelism of these 
lines or lack of it. 

Whenever a logarithmic scale is employed, enough rulings, or rulings 
and tics, should be shown so that the reader will be aware that he is not 
seeing a chart drawn on an arithmetic grid. Since there are other 
unequally spaced scales in addition to the logarithmic scale (for example, 
the reciprocal scale), it is sometimes also desirable to state: ratio 
chart, ^^semi-logarithmic chart,'' or ^‘logarithmic vertical scale." 

Note that a logarithmic scale may cover an integral number of cycles, 
as in Chart 5.5, which has two cycles. On the other hand we may use 
part of one cycle, as in Chart 5.14, or we may employ one or more cycles 
and part of another cycle, as in Chart 5.9. 

* A. common logarithm is the power to which 10 mast be raised to produce a given 
number. Thus, 100 is 10^, and the logarithm of !00 is 2.0; 10,000 is 10^, and the 
logarithm of 10,000 is 4,0. 
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Chart 5,8A. Curves on Arithmetic and Semi- Logarithmic 
Grids. The two curves in each of the lower eight squares are 
equidistant vertically from each other. 


Aeithmbtic Vebtical Scales 

A, A * — Constant amounts of increase, same for both curves. 

B, B * — -Different constant amounts of increase, greater for B» 

C, O'— -Different constant amounts of increase, greater for 
D, D ' — Constant amounts of decrease, same for both cupes. 
JS, E' — Different constant amounts of decrease, greater for E, 

F, F ' — Different constant amounts of decrease, greater for F'. 

G, O ' — Amounts of increase increasing, same for both curves. 
B, B' — ^Amount® of increase decreasing, same for both curves. 

J, F — Amounts of decrease increasing, same for both curves. 
/, J ' — Amounts of decrease decreasing, same for both curves. 


Dooabitkmic Vebtical Scalbs 
Cl, a ' — Constant relative increases, same for both curves. 

6, V — Different constant relative increases, greater fo? &. 

c, F — Different constant relative increases, greater for c. 

d, d' — Constant relative decreases, same for both curves. 

®, — Different constant relative decreases, greater for e. 

/, /-—Different constant relative decreases, greater for/. 

0 , / — Relative increases, increasing, same for both curves. 
L k ' — Relative increases, decreasing, same for both curv«. 
% i ' — Relative decreases, increasing, same for both curves. 
/, / — ^l^lative decreases, decreasing, same for both curves. 




ARfTHMETiC 
VERTICAL SCALES 




LOGARITHMIC 
VERTICAL SCALES 



An arithmetic progression. 


A series in which the absolute change is increas- 
ing.* 

a. If relative change is increasing. 

b. If relative change is constant. 

0. If relative change is decreasing. 



A series in which the absolute change is decreas- 
ing. 



Two arithmetic progressions, same absolute 
changes. 




A geometric progression. 



A series in which the relative change is increas- 
ing. 



A series in which the relative change is decreas- 
ing: 

A. If absolute change is increasing. 

B. If absolute change is constant. 

C. If absolute change is decreasing. 


Two geometric progressions, same relative 
changes. 






Comparisdris of Series of Various Types Fiottedi In Relation to 
Aritlimetie a»d Logarithmie Vertical Scales. Series plotted as shown on one 
scale become as indicated on the other. The above comparisons refer to increasing 
scries only. It is suggested that the reader sketch some comparisons involving declin- 
ing series. 
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■ Iiiterpretaticm of curves. Before proceeding with a consideratioaof 
applications of the semi-logarithmic chart, attention should be given to 
Charts 5.8A and 5.8B ^nd the comments below them. When two straight 
lines are parallel on semi-logarithmic paper (for example, a, a'; 5, d% we 
know that they have constant ratios of change and also that the ratio 
between the two has remained constant. Parallelism between curved 
lines is very difficult to judge with the eye. Reference to the lower sec- 
tions of Chart 5.8A will show that the curved lines are always the same 
vertical distance apart, and thus the two curves in each section are parallel 
with respect to the X-axis. 

APPLICATIONS 


Comparing ratios of increase or decrease. Since there is no 
on the vertical scale of the semi-logarithmic chart, and thiis no base line, 
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Chart 5.9* Motor Vehicle Registrations in the United States and Canada, 
1917-1953. Data from sources given below Chart 5.3, 

and since equal vertical distances (on the same scale) always represent 
the same ratio, it is permissible to use two or more different vertical scales 
in order to bring curves of different magnitude close together for com- 
parison. This has been done in Chart 5.9, which presents the data of 
motor vehicle registrations previously shown on an arithmetic grid in 
Chart 6.3. Shifting the vertical scale of a semi-logarithmic chart moves 
the curve upward or downward, but the slope, which is of paramount 
importance, -is not altered thereby. When using two logarithmic scales, 
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as 111 Chart 5.9, it is desirable (though not absolutely necessary) to keep 
the series of smaller magnitude below that of greater magnitude) likewise, 
if one or^more components are being compared with a total, the curves 
for tjbe components should be below that for the total. 

Chart 5.3 gave us no idea of the relative growth of automobile registra- 
tions in either the United States or Canada. Chart 5.9, however, shows 
relative growth for each series and enables us to compare the ratios of 
growth of these two series of dissimilar size. In general, both series have 
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•17 *21 *26 *29 '33 '37 *41 *45 *49 *53 

Chart 5#10* Animal Per Cent of Increase or Decrease in Motor Vehicle 
Hegistrations in. the United States and Canada, 1918-1953. Data from 
sources given below Chart 5.3. 

shown about the same ratios of increase and decrease throughout the 
period. However, the ratio of increase from 1947 to 1953 is seen to be 
greater for Canada. The insert on Chart 5.9 makes it possible to estimate 
the ratio of increase or decrease from any one year to the next for the 
curves shown. It does not, however, apply to other charts which have 
different scales. 

An alt’ernative method of showing the relative change in motor vehicle 
registrations in the United States and Canada consists of calculating the 
per cent of change for each year and plotting the results on an arithmetic 
grid. This has been done in Cfeart 5.10. 
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Instead of comparing the percentages of change of two different series 
over the same period of time, we may be interested in comparing ratios 
of growth of the same series at different times. Thus in Chart 5.9 we can 
see that the per cent of increase of United States automobile registrations 
was greater from 1950 to 1951 than from 1951 to 1952, and also that the 
relative decline was greater from 1942 to 1943 than from 1943 to 1944. 
Similar conclusions may be drawn from Chart 5.10. 

It is frequently necessary to compare series which are expressed in 
different units. For example, we may compare any two or more of the 
following: commercial failures, in millions of dollars; volume of tradirigW 
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Chart 5.11. Average Monthly Production of Electric Power and of Port- 
land Cement, 1935-1953. Data from U. S. Department of Commerce, Office of 
Business Economics, Bminess StatisiicSf 1953, pp. 131 and 183, and from Survey of 
Current BminesSf February 1954, pp. S-26 and S-38. 

a stock exchange, in number of shares traded; coal production, in 2,000- 
pound tons; petroleum production, in 42-gallon barrels; lumber produc- 
tion, in board feet; cement production, in 376-pound barrels; electric 
power produced, in kilowatt hours; manufactured gas, in cubic feet. It 
is possible to reduce 376-pound barrels to tons, but it is not possible to 
change kilowatt .hours to board feet, or vice versa. ^ 

While one could plot two series expressed in different units on an 
arithmetic grid, it is not often that such a comparison is useful Except 
to ascertain whether the two series fluctuate concurrently, we are not 
likely to be interested in comparing the changes in electric power pro- 
duction in kilowatt hours with the changes in cement production in bar- 
rels. Rather are we apt to want to compare the percentage change in 
electric power production with the percentage change in cement produc- 
tion. On the semi-logarithmic grid, there is no zero base line; only the 
slope of a curve has meaning, and we are enabled to make a valid com- 
parison of the relative changes in the two series expressed in such dis- 
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siisiilEr UBits as thos6 just inentioned. Chart 5.11 shows a eoisiparisoii of^'* 
the production of electric energy and of portland cement. Among other 
interesting comparisons may be noted the more rapid ratio of growth in 
the production of electric power from 1949 to 1953 and the relatively 
more severe decline in production of cement from 1937 to 1938, the only 
year during which both series dropped. 

Comparing fluctuations* Comparison of the fluctuations taking 
place in two chronological series of different size may be illustrated by 

Mil.L«>NS OF 
SH0^^T^T0^4S 



Chart 5.12. Production of Bituminous Coal and Coke, 1935-1953. Data 
from U. S. Department of Commerce, Office of Business Economics, Business 8to- 
tisiics, 1953, pp. 168 and 170, and Survey of Current Business^ March 1954, pp. S-34 
and S-35. Figures for coke include byproduct (oven), beehive, and petroleum coke, 

reference to Charts 5.12 and 5.13, which show the production of bitumi- 
nous coal and of coke for 1935-1953. Both series are expressed in terms 
of short tons, but production of bituminous coal greatly exceeds the 
production of coke. The result is that when the two series are shown 
on an arithmetic grid, as in Chart 5.12, the fluctuations of the larger 
series may be clearly seen but those of the smaller series are not apparent. 
When the two sets of data are depicted on a semi-logarithmic grid (Chart 
5.13), not only can the fluctuations of both series be seen, but their rela- 
tive severity may be compared. For example, it is clear from Chart 5.13 
that the ratio of increase in the production of coke from 1938 to 1940 was 
greater than the ratio of increase in the production of bituminous coal for 
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these same years, and also that the relative decrease from 1948-1949 was 
greater for coal than for coke. 

Instead of being interested in two series, we may wish to compare the 
undulations of a single series which fluctuated around relatively «mall 
values during one period and around decidedly larger values at another 
time. For example, commercial failures were around $100,000,000 to 
$200,000,000 annually from 1895 to 1910. From 1921 to 1933 they 
ranged from $400,000,000 to $933,000,000. In the early 1950’s, they 



Chart 5.13. Production of Bituminous Coal and Coke, 1935-1953. Data 
from sources given for Chart 5. 12. 


were lower again. The semi-logarithmic chart enables us to study the 
relative severity of the fluctuations during such different periods. 

Showing ratios. Chart 5.14 shows how ratios may be presented on 
the semi-logarithmic chart. The two series plotted are the price per 
bushel received by farmers for corn, and the price per 100 pounds received 
by farmers for hogs. When corn is bringing a price which is low in rela- 
tion to the price of hogs, farmers will generally find it profitable to feed 
corn to hogs rather than to sell the com for cash. On the other hand, 
when corn is bringing a price which is high in relation to that of hogs, 
farmers will tend to sell com for cash. If 100 pounds of hogs brings the 
farmer about 13 times as much as a bushel of com, it is largely immaterial 
to the farmer whether he sella his corn for cash or feeds the com to his 
hogs.® For this reason the two scales of Chart 5.14 have been placed in a 


» See page 145, Vhere the hog-cora tatio is discussed. 
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13-to-l ratio.* The chart not o(dy shows the fluctuations in the price of 
hogs' and the price of corn, but also makes it easy to see when the price of 
100 pounds of hogs is more than, less than, or exactly 13 times the price of 
a bushel of corn. When 100 pounds of hogs is selling for more than 13 
times as much as a bushel of corn, the curve for hogs is above the curve 
for com, hogs are relatively valuable, and farmers tend to feed com to 



Chart 5*14. Average Farm Prices of Corn, per Bushel^ and of Hogs, per 
Hundred Pounds, January, 1948~I>ecember 1952. The supplementary scale 
enables us to read the ratio of hog prices to corn prices for any month. The value 
13 is placed opposite the line for corn and the value opposite the hog line gives the 
ratio of the hog price, per hundred pounds, to the corn price, per bushel. For 
March 1952, the ratio is shown to be slightly more than 10, which may be verified 
by referring to Chart 5.15. The supplementary scale is graduated in the same 
manner as is the scale at the right of the chart, the figure 13 being placed opposite 
the corn line because the scale for hog prices has values which are 13 times the 
corresponding values on the scale for corn prices. Data from U. S. Department of 
Agriculture, Production and Marketing Administration, Market NewSj Livestock 
BrOfncht Statistical Bulletin No. 118, November 1952, p. 40, and Bureau of Agri- 
cultural Economics, Statistical Survey y December 1951-February 1953. 


their hogs. When 100 pounds of hogs is selling for less than 13 times as 
much as a bushel of corn, the curve for hogs is below that for com, corn 
is relatively valuable, and farmers tend to sell corn for cash. When the 
two curves are parallel, the ratio is remaining constant; when the corn- 
price curve is sloping upward more rapidly (or downward less rapidly) 
than the hog-price curve, com is becoming more valuable in relation to 
hogs; when the corn-price curve is sloping upward less rapidly (or down- 

^ 

* The scale for hog prices is awkward but is unavoidable in this instance. 

r 
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ward more rapidly) than the hog-price curve, corn is becoming less 
valuable in relation to hogs. The supplementary scale, which is a 
separate piece of paper and which is shown on the chart, enables the reader 
to measure the ratio between the two price curves at any time. 

Chart 5.15 illustrates another method of showing the relationship 
between hog and corn prices. Here the ratio of hog prices to corn prices 
has been computed for each month and plotted on an arithmetic grid. 
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Chart 5.15. Hog- Corn Ratio, January 1948-Becemher 1952. The 
ratio is obtained by dividing the average farm price of hogs per hundred 
pounds by the average price of corn per bushel; the ratio is the number of 
bushels of corn required to buy a hundred pounds of live hogs at the prices 
quoted. Data from U. S. Department of Agriculture, Production and 
Marketing Administration, Market NewSj Livestock Branch, Statistical Bul- 
letin No. 118, November 1952, p. 39, and Bureau of Agricultural Economics, 

Crop Reporting Board, Agricultural Prices, June 30, 1952-January 30, 1953. 

The ratio may be studied without the use of a supplementary scale, but 
changes in corn prices and in hog prices are not shown. 

Interpolation and extrapolation. While an interpolation on an 
arithmetic chart is an arithmetic interpolation, an interpolation on the 
semi-logarithmic chart is a logarithmic interpolation. Thus, if we refer 
to Chart 5.5 and graphically interpolate for the F value midway between 
1950 and 1951, we obtain about 790, which is approximately the same 
figure that we get if we use (log 648 ■+■ log 972) ^2 and take the anti- 
logarithm of the result. 

Extrapolation consists of extending the curve at one end or the other. 
When we extend a curve to estimate for later years than those for which 
we have ' data, we are forecasting. This application of the semi-loga- 
rithmic chart is definitely of questionable value if it involves only the 
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extension of a curve which has indicated in the past that the data exhibit 
a fairiy constant rate of increase. Any forecasting procedure which 
involves-merely the continuation of a curve or the automatic application 
of & formula, without at the same time requiring a careful consideration 
of underl 3 dng and modifying factors, is hardly to be depended upon, 
particularly if economic conditions are in a state of flux. The curve of 
Chart 5.16 shows the population of the East South Central Division of 


POPULATION 

m THOUSANDS 



Chart 5.16. Population of the States in the East South Central 
Division of the United States, 1800-1950, and a Rough Estimate for 
1960. A dubious application of the semi-logarithmic chart. The states 
included in the East South Central Division are: Alabama, Kentucky, Mis- 
sissippi, and Tennessee. Data from U. S. Bureau of the Census, U. S. 
Census of PopulaiioUf 1950, VoL I, Number of Inhabitants f pp. 1-8 and 1-9. 

the United States from 1800 to 1950. Although the extension of the 
curve indicates a possible estimate for 1960, it should be realized that any 
estimate of population in 1960 based only on a knowledge of the preceding 
censuses can have little validity. Ignored have been such considerations 
as: movements of industry to (or from) the division, possible increase in 
population in the division because of decentralization of cities located 
elsewhere, continued movement of Negroes from the division or a reversal 
of that movement, and other factors.^ 

Now that the reader is aware of the nature and uses of the semi- 
logarithmic chart, he may note the occasional presentation of arithmetic 

»Tbe problems involved in forecasting population are discussed in Better 
lotion F orecasting for Areas and Com%mnitmf* by Yan Beuren Stanbery, issued by 
the U. S, Department of Commerce. 
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cllarts in booksj articles, or reports when semi-logarithmic charts would 
have been more suitable. The reverse mistake is rarely made. Each 
type of chart serves a useful, but quite different, purpose. The arith 
metic chart should be used when absolute comparisons are desked 
(Charts 5.10 and 5.15 are absolute comparisons of ratios); the semi- 
logarithmic chart should be employed when relative comparisons are 
called for. 

CONSTRUCTION OF LOGARITHMIC SCALES 

One logarithmic cycle will accommodate a tenfold increase; two cycles 
make provision for a hundredfold increase. Reference to the various 
charts included in this chapter will show that no vertical logarithmic scale 
(other than those shown in Chart 5.7) extends over more than two cycled. 
Two-cycle semi-logarithmic paper will suffice for most series which the 
chart maker is likely to encounter; rarely will he need paper covering 
more than three cycles, since it allows for a thousandfold increase. Even 
in cases where a series of very small magnitude must be compared with 
one of very large magnitude, a number of cycles is not needed, since it is 
desirable to use two vertical scales to bring the two curves together for 
comparison, as in Charts 5.9 and 5.13. Many sorts of ready-ruled semi- 
logarithmic paper are available from various sources. If, however, only 
two-cycle paper is available and paper having more cycles is needed, it is 
merely necessary to trim the lower margin from a sheet of two-cycle paper 
and paste it above another sheet. 

At times it may be desirable to use one- or twm-C 3 ’'ele paper, but with a 
larger- or smaller-size cycle than those which are readily available. Using 
an ordinary sheet of semi-logarithmic paper and placing a sheet of plain 
paper diagonally on top of it, a logarithmic scale may be expanded as 
shown in Chart 5.17. A logarithmic scale may be contracted by placing 
a sheet of semi-logarithmic paper diagonally on a piece of plain paper and 
ruling horizontal lines, as shown in Chart 5.18. For those who have 
frequent occasion to use logarithmic scales of varying size, a device such as 
that shown in Chart 5.19 is useful.® The original of this chart provides 
a logarithmic cycle varying from li inches to 12 inches. Of course, any 
number of cycles may be built up on top of one another. 

In case no suitable logarithmic paper and no logarithmic scales of any 
sort are available, it is possible to construct a logarithmic scale of any 
desired size by referring to a table of logarithms. With scale values 
spaced in proportion to the differences between their logarithms, a scale 


® Designed by Harriet Edmunds, of The Chartmakers, Inc,, 480 Lexington Ave., 
New York, N. Y. 
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Cbarl 5*19. A Flexible Logarithmic Scale. The original provides logarithmic scales ranging from li to 12 inches. 
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may be constructed in terms^of 1 to 2 would be 0.301030 

from 2 to 3 would be 0.176091 units, and so on. Inter- 
medilte values are located similarly. 


Scale value 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 
20 
30 
40 
50 
60 
70 
80 
90 
100 


Logarithm 

0 

0.301030 

0.477121 

0.602060 

0.698970 

0.778151 

0.845098 

0.903090 

0.954243 

1.000000 

1.301030 

1.477121 

1.602060 

1.698970 

1.778151 

1.845098 

1.903090 

1.954243 

2.000000 


Difference 

0.301030 

0.176091 

0.124939 

0.096910 

0.079181 

0.066947 

0.057992 

0.051153 

0.045757 

0.301030 

0.176091 

0.124939 

0.096910 

0.079181 

0.066947 

0.057992 

0.051153 

0.045757 


iUU 

shown in this chapter. In Chap , jg In Chapter 20 we 



CHAPTER 6 


Graphic Presentation III; 

OTHER TYPES OF CHARTS 


A number of other graphic devices, in addition to curves, are available 
for presenting statistical information. In this chapter we shall give 
brief attention to bar charts, pie diagrams, pictographs, and statistical 
maps. 


BASES OF COMPARISON 

Chart 6.1 shows how the number of tractors on farms may be compared 
by means of three types of diagrams: (A), a bar chart involving one- 
dimensional comparisons; (B) and (C), circles and squares, involving 
two-dimensional comparisons; and (D), a three-dimensional comparison 
represented by tractors of varying sizes. Readers of charts obtain most 
accurate impressions of the magnitudes shown when data are represented 
by means of bar charts, and least accurate impressions when data are 
represented by volume diagrams. Area diagrams are more accurately 
judged than volume diagrams, but less accurately than bar charts.^ It 
should also be remembered that volume diagrams shown on the printed 
page make it necessary for the reader to visualize the third dimension 
before making his comparison. Another disadvantage of charts using 
squares, circles, or pictures of different sizes is that the reader may be 
uncertain whether to compare heights, areas, or volumes. In any event, 
the basis upon which the diagram was drawn should be indicated. If it 
is argued that the correct basis of comparing the size of such objects as 
tractors is the apparent weight of the different tractors, and if the chart 
maker has drawn the tractors so that the number of tractors in different 
years is shown by the height or length of the tractors, as is sometimes 

^ See ** Graphic Comparisons by Bars, Squares, Circles, and Cubes/* Frederick 
E. Croxton and Harold Stein, Jmrnal of the American Btatieiical Amoctaiim^ March 
1932, pp. 54-60. 
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1930 1940 1946 1950 1953 

O 

Chart 6.1. Number of Tractors on Farms in the United States* 
1930* 1940^ 1945, 1950* and 1953. The data are represented by (A) 
bars, (B) circles, (C) squares, and (D) pictures of tractors. Fart A 
involves linear comparisons; parts B and C require comparisons of 
areas ; part D caEs for comparisons of volumes* Data from AgricuUwal 
SMisticSi p. §31, and 1953^ p. 660. 
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d?)ne, then the reader who judges the siaes upon the basis of apparent 
weight (essentially volume) will get an exaggerated impression of the 
variation in number of tractors during the different years. 

Charts involving volume comparisons appear all too often in news- 
papers and magazines. Later in this chapter we shall see how it is pos- 
sible, by means of pictographs, to obtain the attention-getting value of 
pictures and at the same time get visual impressions as accurate as may 
be had from bar charts. 


BAR CHARTS 

The bar chart shown in section A of Chart 6.1 is a simplified form using 
no scale. In Chart 6.2 the same data are shown by means of a bar chart 

THOUSANDS 
OF TRACTORS 



Chart 6.2. Number of Tractors on Farms in the 
United Stales, 1930, 1940, 1945, 1950, and 1953. Data 
from sources given below Chart 6.1. 


which has a scale and which also varies the spacing between the bars in 

order to call attention to the fact that the time intervals vary. When the 
chart is expected merely to convey a very general impression, simple bar 
charts may be drawn without the use of a scale, as in section A of Chart 
6.1. However, when two (or more) bar charts using different scales are 
in juxtaposition and may be compared with each other, the scales should 
be shown. Another caution concerns the presence of zero on the scale; 
Chart 6.3, which lacks the zero, shows that the omission of the zero is just 
as TTiislAa Hi ng in this type of chart as in the case of arithmetic curves. 

Ail of the preceding bar charts showed chronological data, and, follow- 
ing the customary procedure, the bars were arranged vertically. Vertical 
bars should also be used for data classified quantitatively, for example, 
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Chart 6,3. A Bar Chart Lacking a Zero on the 
Vertical Scale. From National Board of Fire Under- 
writers, Fire Insurance Facts and Trends, August 1953. 

MILLIONS OF ACRES 

0 30 60 90 


CORN 


WHEAT 


OATS 


barley 


RYE 


Chart 6.4. Acreage Harvested in the United States of Corn^ 

Wheat, Oats, Barley, and Hye, 1952. Data from AgricuUnral 
Statistics, 195$, pp. 1, 16, 31, 41, and 47, The acreages given for oats, 
barky, and rye are the acreages harvested for grain. 

data of the uumber of persons in the United States claissified by age groups 
or according to years of schooling. When making comparisons of data 
classified qualitatively or geographically, on the other hand, horizontal 
bars are generally used. Chart 6.4 shows such a comparison of the 
acreage harvested of each of fife crops in 1952. 


81,359.000 


70,585*000 

1 

» 8,264,000 

1 1,385,000 

38,643,000 

* 
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Chart 6.5. An Application of the Bar Chart. From United States 
Bureau of Labor Statistics, Wages, Hours, and Working Conditions in the 
Bread-Baking Industry, 1984, Bulletin No. 623, p. 75. 


MtULfONS OF 
OF PERSONS 
160 { — ‘ 



d NATIVE BORN ^ FOREIGN BORN 


Chart 6,6. Native-Born and Foreign-Born Population of the United 
States, 1880-1950. The relative growth of the two series is not apparent from 
this type of chart, but may be shown by means of a sey -logarithmic chart, as 
described in the preceding chapter. Because of the nonexistence of zero on a 
logarithmic scale, curves would be used instead of bars. Data from StaiisUeal 
Abstract of the United States, 1958, p. 31, and from U, S, Census of Population, 
1950, Voi. II, Fart 1, Chapter B, p. l-ST, and Vol IV, Fart Z, Chapter B, p. 
3B-82., 
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There are no set rules to be observed in drawing bar charts. Certain 
considerations, however, are helpful. 

(1) Individual bars should be neither exceedingly short and wide nor 
very long and narrow. 

(2) Bars should be separated by spaces which are not less than about i 
the width of a bar or greater than about the width of a bar. 

MILLIONS OF ACRES 



Chart 6.7t Acreage Harvested in the United States 
of Corn, Wheat, Oats, Barley, and Kye, 1940 and 1952. 

Data from source given below Chart 6.4. 

(3) A scale is generally useful. It should be about i the width of a bar 
from the top bar (or from the left bar, if the bars are vertical). 

(4) Guide lines are an aid in reading the chart. Sometimes the chart 
is enclosed and the ^de lines are extended through the entire chart, as 
in Chart 6.4; sometiA®® chart is not enclosed and the guide lines are 
out off, as in Chart 6.7. 

When showing a time series graphically, we may use either a bar chart 
or a curve. A curve facilitates .a study of the general change which has 
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“taken place in a series, whereas a bar chart enables comparisons of specific 
years to be made more readily. If the series covers many years," it is 
generally not desirable to use a bar chart, which is laborious to^ construct. 
When only a few years are shown, as in Chart 6,2, a bar chart is prefejable. 

Chart 6,5 shows an interesting application of the principle of the bar 
chart. It indicates for each of 93 bakeries the proportion of day and night 
operation during a year. The advantage of this chart is that it shows the 


PERCENT CHANGE, 1952-53 
6 0 5 10 IS 



Chart S.8. Percentage of Increase or Decrease in 
Planned Plant and Equipment Outlays for 1953 as Com- 
pared with Investment for 1952, for Six Industry- 
Groups. From Survey of Current Business, April 1953, p. 1. 

information for each of the 93 concerns in a more compact form than could 
well be done otherwise. 

Sometimes we wish to compare two sets of data over a period of several 
years. This may be done by means of a two-unit bar chart, as shown in 
Chart 6.6. Similarly, we may wish to compare several categories for 
two years; a comparison of this nature is shown in Chart 6.7. 

A two-direction bar chart, such as Chart 6.8, may be used to show 
increases and decreases. This type of chart is even more effective if 
increases can be shown in black and decreases in red. Increases and 
decreases in a series of data for a number of years may be shown by means 
of vertical bars above and below a horizontal zero line. 

PICTOGRAPHS 

In section D of Chart 6.1 the number of tractors on farms at each of 
certain years was represented by means of pictures of tractors of varying 
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size. While this sort of chart does not convey a satisfactory comparisorx 
to a" reader, it does attract attention. The pictorial effect may be 
retained apd a satisfactory visual comparison afforded by using a number 
of sipall pictures, all of the same size, and arranging them so as to form 

1930 

1940 0'^^ 

jAAmm .itAm 

dMh^Snni ->A.!fiiMiiiiii .jit 

■953 OT>Cn><3^C^O 

EACH SYMBOL REPRESENTS 1.000.000 TRACTORS. 

Chart. 6.9. Number of Tractors 
ott Farms in the United States, 

1930, 1940, 1945, 1950, and 1953. 

Bata from Agricultural Statistics, 19B2, 
p. 631, and 1B5S, p. 560. The tractor 
was designed by Pictorial Statistics Co. 


about 80% of the money 

©©©©©©©© 


iiiitiiiii 


comes from 10% 
of the givers 


Chart 6.10. A Pictograph Used hy Hobart and 
William Smith College- From Ld^s Look at Hobart 
and William Smith, p. 14. The original was in two 
colors. 

a bar chart. Such a graph is referred to as a pictograph. Chart 6.9 
shows a comparison of tractors on farms by means of this device. While 
the diagram is essentially a bar chart, it is more attractive and thus is 
more likely to be examined by a reader. No scale is used, but since the 
pictures are all of the same size and since each represents one million 
tractors, approximate numerical values may be had from the chart, if they 
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^re wanted. Although a bar chart of a time series generally uses vertical 
bars, it will be observed that the pictograph shown as Chart 6.9 hashori« 
zontal bars. Pictographs are often arranged in this way because it seems 
more suitable to have tractors, people, houses (or whatever is*being pic- 
tured) standing side by side rather than on top of one another. 
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Chart 6.11. A Modified Pictograph. From Health 
Insurance Council, Accident and Health Coverage in the 
United States^ September 1953, p. 21. 

Chart 6.10, another example of a pictograph, is an interesting method 
of showing that campaigns for funds are apt to depend heavily upon 
relatively few large gifts. Chart 6.11 represents a slightly different 
application of the pictograph idea. Here, bars and a scale are used, but 
pictures are superimposed upon the bars. (Use was also made of a 
picture in Chart 6.3. which was not a pictograph.) It should be apparent 
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that, in making a pictograph, the picture is so chosen as to suggest the 
natuve of the data being shown. Certain basic rules for the use of pic- 
torial devices are shown in Chart 6.12. 

SYMBOlS SHOUIO BE SElf.EXPlANATORY 

^ A ili£ A 


changes in numbers are shown 

8Y more or fewer symbols not by larger or Smaller ones 



EACH Ship REPRESENTS 5 MltltON TONS 


charts give an OVER.All PICTURE 



NOT minute details 

<.873,285 

n,075,357 

20,468.953 


PICTOGRAPHS MAKE COMPARiSONS NOT FLAT STATEMENTS 


l670iMiilS 


irnmKLmmmz momRwmwmmsL 

Chart 6.12. The Basic Rules for Drawing Picto- 
graphs as Suggested by Modley and Lowenstein. 

From Rudolph Modley and Dyno Lowenstein, Picto- 
graphs andGraphs, Harper and Brothers, New York, 1952, 
pp. 25 and 26. 

COMPONENT-PART CHARTS 

The parts of a total may be shown by means of a bar as in Chart 6.13 
or by a pie diagram as4n Chart 6.14. The bar chart involves a one- 
dimensional comparison of the lengths of the sections of the bar; whereas 
the pie diagram involves a two-dimensional comparison of the pie sections, 
or a one-dimensional comparison of the arcs of the pie sections, or a com- 
parison of the central angles. Accuracy of judgment is about the same 
whether based on a bar chart or a pie diagram,^ with the exception that, 


^ See ^^Bar Charts Versus Circle Diagrams/* by Frederick E. Croxton and Roy E. 
Stryker, Journal of the Arnericm Statistical Association, December 1927, pp. 473-482. 
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wiien depicted by a pie diagram, 25-per-cent (shown by a right angle) and 
50-per-cent (shown by a diameter) sections are more accurately gauged. 
The pictorial value of the pie diagram is perhaps greater than that of the 
bar chart, and it is increased when the pie diagram is designed to suggest 
a silver dollar. Chart 6.15 shows an application of this sort. A single 
component-part bar is occasionally drawn 
without a scale and is sometimes horizon- 
tal. One advantage of the vertical bar over 
either the horizontal bar or the pie diagram 
is that the sections of the vertical bar are 
easier to label. 

Several suppliers of graph paper offer 
sheets showing a circle with the circumfer- 
ence graduated from 0 to 100, thus enabling 
one to construct pie diagrams readily. If 
such sheets are not available or if varying 
sizes of circles are desired, pie diagrams 
may be made by the use of compasses and 
a protractor. Since the conventional pro- 
tractor divides a circle into 360 parts or 
degrees, the percentages which are to be 
shown should be multiplied by 3.6. Divid- 
ing a circle into percentages is facilitated 
by use of a protractor® calibrated to divide 
a circle into 100 parts, as shown in Chart 
6.16; such a scale may be engraved or other- 
wise marked on the back of an ordinary 

protractor. Chart 6,13. ProportioB of 

Chart 6.17 shows how bar charts may be the Population of the United 
used to compare several sets of component 

parts and also how the same comparisons Bureau of the Census, V. S. 
may be made by means of pie diagrams. Census 0 / Population, ISBO, Vol. 
It seems clear that comparisons between Su^ 

the years are made more easily from the i^ary, p. 1-93, 
bars than from the circles. The guide lines 

running from section to section assist in making comparisons from the bar 
chart: when the lines are parallel, there has been no change; when they 
diverge, there has been an increase; when they converge, a decrease has 
occurred. 

The comparison of component parts in Chart 6.17 is on a relative basis; 

® See A Percentage Protractor/' by Frederick E. Croxton, Joutml of ihe Amertcm 
Stati&iical Associaiionf March 1922, pp, 108-109*. 





Chart 6.14. Proportion, of the Population of the 
United States in Each Specified Age Group, 1950. 
Data from source given below Chart 6.13. 







1910 1920 1930 1940 1950 

Chart 6.17. Proportion of the Population of the United States in Each 
Specified Age Groups 1910-1950, Bata from sources given below Chart 4.23. 

the proportion of each age group in the population is shown. When we 
indicate how many of each age group were enumerated, we have diagrams 
such as are shown in Chart 6.18. The bars and circles vary in size 
because the total population has increased. In this instance the bar 
chart is clearly preferable to the pie diagram. When data such as those 
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siiown in Charts 6.17 and 6.18 cover a number of 3 ^ears, it is generally 
pre-ferable to make use of curves, as was done in Charts 4.23 and 4.24. 
While the bar charts of Charts 6.17 and 6.18 present chronological data, 
we may also compare component parts for different places or categories. 


MiLLIOMS OF 
OF PERSONS 

160! 


60 AND OVER 



1910 1920 1930 1940 1950 



Chart 6.18. Population of the United States in Each Specified Age Group, 
1910-1950. Data from sources given below Chart 4.23. 


For example, we might compare the proportions of males and females in 
the urban population with the proportions of males and females in the 
rural population. One bar, subdivided for males and females, would rep- 
resent the urban population ; the other bar, similarly divided for the sexes, 
would represent the rural population. 


STATISTICAL MAPS 

Statistical maps are graphic devices which show quantitative informa- 
tion on a geographical basis. We shall consider hatched or shaded maps, 
dot maps, and pin maps. 
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Matched maps# Hatched or shaded maps undertake to show for each 
geographical area under consideration the magnitude of the plieiiomenoE 
which is being studied. The variations in magnitude are represented 
graphically by progressive differences in hatching or shading. In Chart 
6.19 the various hatchings indicate the ‘^levels of living of farm-opera- 
tor families in the counties of the United States in 1950. The counties 
having the highest levels of living are shown in solid black, and the 


Based on 19S0 County Indexes 


FARM-OPERATOR FAMILY 



U. S. DEPARTMENT OF AGRICULTURE NEC. 4845S-XX BUREAU OF AGRICULTURAL ECONOMICS 


Chart 6.19. A Hatched Map. 

hatching becomes progressively lighter so that the lightest indicates the 
counties which had the lowest levels of living. The outstanding char- 
acteristic of maps such as this is that a progressive change in the hatch- 
ing or shading indicates an increase (or decrease) in the phenomenon 
being measured. 

Sometimes statistical maps are made in colors. However, the principle 
of progressive shading cannot be developed satisfactorily by using dif- 
ferent colors. It is possible, of course, to use progressive shades of a sin- 
gle color and thus sometimes to produce a more attractive map than 
could be done by using black and white. 

Dot maps* The preceding statistical map showed data that applied 
to entire areas — specifically, the average level of living for counties — and 
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SO a hatched or shaded map was appropriate. When the geograph^al 
distribution of occurrences is to be shoivn, the dot map should be used. 
Chart 6.20 shows one of the simplest of dot maps. Each dot represents 
500 farms, and the concentration in various parts of the country is clearly 
shown. In a dot map, the number of units represented by one dot may 
be large, as in Chart 6.20, so that the number of dots in a region is small 
enough to be counted, or the number of units represented by one dot 
may be small, so that the numerous dots give the effect of a gradual 



Chart 6.20. A Dot Map. 

change in intensity of shading from light to dark. Which technique to 
use depends on the purpose of the chart. 

A different sort of dot map is shown in Chart 6.21, which uses dots of 
varying si 2 ;e. In this study, 4,030 truck drivers %vere stopped at various 
places and were asked how long they had been driving and certain other 
correlative questions. The areas of the circles indicate the relative num- 
ber of drivers questioned at each point. While the varying circle sizes 
indicate clearly that more drivers were quizzed at certain places than at 
others, it is not easy to make accurate comparisons from these dots. We 
cannot compare diameters directly. We must remember that, if one cir- 
cle has a diameter twice as great as another, then the first circle has an 
area four times that of the second. 

Pin maps. Pin maps may be thought of as a particularly flexible sort 
of dot map. They consist <i maps mounted on a backing of cork, card- 



Chart 6.21* Number of Drivers Interviewed and Lo2;ation of 
Interview in a Study of Driving Practices of Truckers. Reproduced 
from National Safety Council, How Long on the Highway^ 193€, p. 19. Note 
that five of the states are not identified. 
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t 6.22, An Automobile Accident Fin-M^p of the City of Syracuse® New 
York. From National Safety Council, Chicago, Illinois. 
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board, wallboard, corrugated cardboard, or the like, on v’’hich informa- 
tion is recorded by means of pins having (usually) glass heads of differ- 
ent siz^, color, and shape. The available pins have heads that range in 
size from about xg- inch to about | inch in diameter. A large number of 
colors is available as well as a variety of shapes, such as round-, square-, 
and triangular-head pins. Pin maps may be readily altered as the facts 



■ MEDICAL COVERA&E ■ PGPULATIOM 
{OOdOMtHEQ} 0(000 GMimOl 

Chart 6.23. Map with Superimposed Bar Charts. From Health Insurance 
Council, Accident and Health Coverage in the United States j September 1953, p. 17. 

change. Because of this flexibility and the wide variety of pins avail- 
able, the pin map is frequently employed as a method of presenting geo- 
graphical data. An extensive pin-map scheme, involving one or more 
maps mounted on cork and hundreds or thousands of pins, is expensive 
but may often prove very useful. 

Chart 6.22 shows a pin map used to record the location and result of 
automobile accidents. By using one or more such maps, it is possible 
not only to observe the frequency with which accidents occur at various 
places, but also the nature of each accident (automobile hitting pedes- 
trian, automobile hitting automobile, automobile hitting fixed object, and 
so forth) and the result of the accident (property damage, occupant 
injured, occupant killed, pedestrain injured, pedestrain killed, and so on). 

One diflflculty with the statistical map is that the importance of differ- 
ent regions is not to be Judged by their areas. For instance, a hatched 
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map showing income per family in different states would be somewhat 
misleading because there are many more families in some of the states 
occupying very small areas than there are in other states occup^ring very 
large areas. An interesting device sometimes used for overcoming ihis 
difficulty consists of drawing the map in such a way that the area of each 
state is in proportion to the number of families in that state. 

Occasionally a map and some other type of chart are used in combina- 
tion. Chart 6.23 shows a map on which four simple bar charts have 
been superimposed. The original of the map had the bars for hospitS-l, 
medical, and surgical coverage in red and the bars for population in 
black. With the geographical areas separated, the reader may visualize 
exactly what territory is referred to in each instance. 



CHAPTER 7 


Rates, Ratios, and Percentages 


It was pointed out in the chapter dealing with statistical tables that 
derived figures are useful to assist in summarizing and comparing data. 
In that chapter specific mention was made of rates, ^ ratios, percentages, 
and averages. This chapter will discuss rates, ratios, and percentages. 
Averages and related measures will be examined in later chapters. 

To express the ratio which 753 bears to 251, we divide 753 by 251, 
which gives 3, and we say that 753 is to 251 as 3 is to 1, or more briefly, 
753:251: :3:1. We have thus indicated the relationship which the first 
of these two numbers bears to the second as a ratio to one. If it suited 
our purpose better, we could express the relationship as a ratio to any 
other number. For example, we could use a ratio to ten, saying 753 : 
251: :30:10; we could use a ratio to one hundred and write 753:251: : 
300 : 100. This last ratio, per hundred, is generally referred to as a pcr- 
centage^ and we note that 753 is 300 per cent (from per centum) of 251. 
It will thus be seen that percentages, which are used so frequently, are 
merely special cases of the more general concept of ratios. If, instead 
of using a ratio per hundred, we find occasion for a ratio per thousand, 
we may refer to our figures as “per mille.^’ 

Ratios are computed in order to expedite comparisons. Not only are 
large numbers reduced as in Table 3.4, but much is gained by comparing a 
series of figures with a rounded base of 100 (which can be carried in one^s 
mind) rather than by attempting to compare each individual population 
figure with the total for the entire United States. Relative change may 
be visualized more concretely when sho%vn by percentages, as in Table 
7.1, or when shown by one of the methods used in Table 7.2. 

^ The term rate is sometimes used to mean the amount or quantity of one variable 
considered in relation to one unit of a different variable. Thus, 20 miles per hour 
is a rate of speed. The relationship that two similar variables bear to each other is 
often termed a ratio. For example, the current ratio, -which is the ratio of current 
assets to current liabilities, compares two figures which are both in terms of dollars. 
General usage does not always observe this distinction between rate and ratio. 

136 
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TABLE 7.1 


Acres Harvested of Selected Grains in the United States^ 
1951 and 1952 


Grain 

1951 

(thousands 
of acres) 

1952 

(thousands 
of acres) 

Per cent 
increase* 

Corn 

80,736 

81,359 1 

0.8 

Wheat 

61,492 1 

70,585 

14,8 

Oats 

36,525 ! 

38,643 

5.8 

Barley 

9,436 

8,264 

-12.4 

Rice 

1,967 

1,972 ' 

0.3 

Rye 

1,710 

1,385 

-19 0 

Buckwheat . 

201 ! 

161 

-19 9 


* A minus sign denotes a decrease. 

Data from Crop Reporting Board, U. S. Bureau of Agricultural Economics, 
Crop Production, March 19, 1953, p. 11. 


TABLE 7.2 

Production of Steel Ingots and Steel for Castings in the United 
States, 1943-1952 


Year 

Production 
(thousands 
of short 
tons) 

Per cent 

1 of 1943 

Per cent 
increase* 
over 1943 

Per cent of 
preceding 
year 

Per cent 
Increase* 
i over pre- 
ceding year 

1943 

88,836 

100.0 




1944 

89,642 

100.9 

0.9 

: 100 9 

0.9 

1945 

79,702 

89.7 

-10 3 

88.9 

-11.1 

1946 

66,603 

75.0 

-25 0 

! 83.6 

-16.4 

1947 

84,894 

95.6 

- 4 4 

127.5 

27 5 

1948 

88,640 

99.8 1 

- 0 2 

104.4 

4.4 

1949 

77,978 

87.8 1 

-12.2 

88 0 

-12.0 

1950 

96,836 

109.0 

9 0 

124.2 

24.2 

1951 

105,200 

118.4 

18.4 

108 6 

8.6 

1952 

93,156t 

104.9 

4.9 1 

88.6 

-11.4 


* A minus sign denotes a decrease 
t Preliminary. 

Data from various issues of the Survey of Current Businena. 


CALCULATION 

When one or more numbers are being compared to another number, the 
figure to which comparisons are made is known as the base, A ratio is 
found by dividing^ the figure, -which is being compared to the base, by 
the base. The figure is then expressed in terms of or in relation to the 
base, and ratios of all sorts are therefore sometimes referred to as relative 
numbers or relatives. 

The amount of money in circulation in the LTnited States on June 30, 

^ Instructions for operating calcuiating machjnes may be obtained from the sales 
offices of the calculating machine companies. 
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1943, was $17,421,261,974. On June 30, 1952, the circulating medium 
totaled $29,025,925,276. To state the 1952 circulation in terms of the 
1943 circulation (the base), we divide $29,025,925,276 by $17,421,261,974 
and obtain 1.6661. This means that the money in circulation in 1952 
was L6661 times as great as in 1943 . In many instances, ratios are most 
useful when stated as percentages. To change 1.666, the ratio to one, 
to a ratio per hundred, the decimal point is moved two places to the 
right; the resulting figure, 166 . 6 , indicates that money in circulation in 
1952 amounted to 166.6 per cent of the amount in circulation in 1943, 

" Jt should be noticed that there are two ways in which we can express 
the percentage figure just given. Instead of saying that 1952 circula- 
tion was 166.6 per cent of 1943 circulation, we ma}’' say that circulation 
dn 1952 was 66.6 per cent greater than in 1943. In the first instance, we 
compared the figures for the two years; in the second, we compared the 
change which took place® with the figure for 1943. 

EFFECT OF CHANGING BASE 

Naturally, a different set of percentages would be obtained if we com- 
pared the 1943 circulation figure with the 1952 figure. We are now 
using 1952 as the base and the 1943 figure is divided by that for 1952. 
Performing this operation indicates that circulation in 1943 was 60.0 per 
cent of that in 1952, or that circulation in 1943 was 40.0 per cent less 
than that in 1952. Observe that, while the 1952 figure was 66.6 per cent 
greater than the 1943 figure (1943 was the base), the 1943 figure was 
40.0 per cent less than the 1952 figure (1952 was the base). This differ- 
ence is, of course, due to the fact that the basis of comparison was first 
in reference to 1943, then to 1952 . If a number is increased 100 per 
cent, the second number need be decreased but 50 per cent to arrive at 
the original figure. Conversely, if a given number is decreased 60 per 
cent, the second number must be increased 100 per cent to reproduce the 
given number. 

The failure to realize the effect of this change of base may lead to the 
drawing of false conclusions. A firm decreased the wages of its employees 
15 per cent; later it increased the reduced wages 5 per cent; then it raised 
these increased figures 5 per cent; and finally it increased these second 
figures another 5 per cent. Afterwards it announced that the three 


3 Suppose we are comparing two percentages, as 4.0 per cent and 9,0 per cent. We 
may speak in absolute terms and say that 9.0 per cent is 5.0 per cent more than 4.0 
per cent. We may speak in relative terms and say that 9.0 per cent is 125 per cent 
greater than 4.0 per cent, or. that 9.0 per cent is 225 per cent of 4.0 per cent. When 
comparing percentages, it is advisably to make quite clear whether wo are speaking in 
absolute or relative terms* 
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5 pel* cent increases put wages back where they were before the 15 per 
cent reduction. Calculation will show that the new wages were really 
98.4 per cent of the original wages before reduction. If tli^ company 
had given a single 15 per cent increase of the reduced wages, the* new 
wages would have been but 97.75 per cent of the original wages. 

Table 7.3 shows for selected percentages of increase the per cent which 
the new number must be decreased to reproduce the original number. It 

TABLE 7.3 


lihistrations of Effect of Shifting Base in Culculating Percentages 


Given number 

Per cent of 
increase 

New number 

Per cent new number 
must be decreased to 
yield given number 

10 

500.00 

60.00 

8.3 ,33 

10 

200.00 

30.00 

66:67 

10 

100.00 

20.00 

60.00 

10 

50.00 

15 00 

33.33 

10 

33.33 

13.33 

25.00 

10 

25.00 

12.50 

20.00 

10 

10.00 

11.00 

9.00 

10 

o.OO 

10.50 

4.76 

10 

1.00 

10,10 

1 0.99 


should be borne in mind that a per-cent-of-increase figure may be indefi- 
nitely large; however, a per-cent~of-decrease figure of 100 indicates a 
decline to zero, while a per cent of decrease of over 100 indicates a fall to 
a negative quantity. 

RECORDING PERCENTAGES 

Generally percentages are recorded to one decimal place. If the per- 
centages are based upon large figures, and particularly if one, or more than 
one, part of a total is quite small (see Table 3.4), it may be desirable to 
use more than one decimal. Occasionally only whole percentages are 
shown, in order that relationships may be grasped readily’'. Whole per- 
centages will not suffice, howwer, when the relative variations are 
extremely small. 

Percentages should not be calculated if the absolute numbers are small, 
especially if the base is appreciably less than 100. A serious difficulty 
arising out of the use of percentages based on small absolute numbers is 
discussed on page 150. 

When percentages are to be recorded with one decimal, they are 
rounded to the nearest tenth of one per cent. The following examples 
will indicate the procedure in rounding percentages (and also in round- 
ing other calculations^ involving remainders) : 

^ Bee Appendix T for a more comprehensive disetission of rounding nnmhers.' 
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(1) $371.16 $679.28 = 0.5464, or 54.64 per cent. The second dec- 

imal is less than 5 and therefore this percentage, to the nearest tenth of 
one per cojit, is 54.6. 

(2) 2,319 pounds ^ 7,532 pounds = 0.3079, or 30.79 per cent. In 
this instance the second decimal is more than 6 and the percentage 
should be recorded as 30.8. 

(3) 280,511 feet -f- 11,000,000 feet = 0.025501, or 2.5501 per cent. 
Here the second decimal is 5, but there is a remainder which results in 
the. 1 in the fourth decimal place. Recorded to the nearest tenth of one 
per cent, this figure is 2.6. 

(4) 1,341 barrels 6,000 barrels = 0.2235, or 22.35 per cent. Here 
the nearest tenth is either 22.3 or 22.4. It does not greatly matter 
whether occasiojial results such as this are raised in the first decimal 
place or whether the second decimal is dropped. However, it is better 
to follow^ some consistent scheme. Particularly when many computa- 
tions are being made which are eventually to be added, it is well to 
employ a method Avhich will cause half of the values with a second deci- 
mal of exactly 5 to be raised and half to be lowered. This practice will 
avoid the accumulation of errors. Probably the most satisfactory scheme 
is to raise the first decimal -when the first decimal is an odd number (67.35 
becomes 67.4) and to drop the second decimal when the first decimal is 
an even number (67.65 becomes 67,6). 

Reference to the percentage data shown in the last column of Table 8.6 
will reveal that the eleven percentages add to 99.9 rather than to 100.0. 
This is the consequence of rounding all percentages to one decimal place, 
which sometimes results in totals of 99.9 or 100.1 and occasionally shows 
99.8 (as in the next-to-the-last column of Table 8.6) or 100.2. Some 
statisticians adjust one of the percentages in order to produce the cor- 
rect total (see note below Table 7.5), but it seems preferable to let each 
percentage stand correctly rounded, as in Table 8.6. 

TYPES OF COxMPARISONS 

We have already seen an instance in which the parts of a whole were 
compared to the total in Table 3.4. Here the percentages were obtained 
by dividing each item in turn by the total. More expeditiously we may 
take the reciprocal of the total and multiply the reciprocal by each of the 
component figures. This is a time-saving device adapted particularly to 
the calculating machine, and is applicable whenever we are dividing a 
series of numbers by a constant number. 

Various illustrations of comparisons of one figure with another figure 
are given on later pages in this chapter. For instance, in the paragraph 
on sex ratios it is noted that each figure for males is divided by the 
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appropriate figure for females, since the sex ratio consists in stating tSe 
number of males per 100 females. 

Table 7.2 indicates a number of different comparisons which may be 
made in regard to data arranged chronologically. In column S, the pro- 
duction of steel ingots and steel for castings for each year is compared 
with the 1943 production; each figure is divided by that for 1943. Col- 
umn 4 shows the percentage by which the production for each year 
exceeded that for 1943; each yearns numerical increase or decrease over 
1943 is divided by the 1943 production. In column 5, the production 
each year is related to that of the preceding fear; each year's figure is 
divided by that for the preceding year. Column 6 indicates the per cent 
of increase or decrease of each year in relation to the preceding year; the 
numerical increase (or decrease) of each year over the preceding year 4s 
divided by the production for the preceding year. In columns 3 and 4, 
comparisons are made with a fixed base, 1943. In columns 5 and 6, the 
base is constantly shifting, being always the preceding year. 

Another application of percentages is shown in Table 7.1. Here the 
1951 figure for each crop is the base. The percentage columns headed 
‘^per cent increase” indicate the relative increase or decrease in the acre- 
age harvested of each crop from 1951 to 1952. 

SOME FREQUENTLY USED RATIOS 

The following paragraphs indicate a few interesting applications of 
ratios and percentages. The reader will doubtless become aware of 
many others as he reads more or less technical material in magazines, 
newspapers, books, and advertisements. 

Index numbers* Most index numbers are presented in the form of 
percentages.^ In the construction of an index number of wholesale 
prices, for example, the commodities to be included are selected first, 
and their prices are then combined with due regard to the varying impor- 
tance of the different commodities. If the index number is a chronolog- 
ical one, as is usually the case, some year may be designated as the base 
and prices in that year are set equal to 100. The prices for the other 
years are then expressed in relation to that base year. The United 
States Bureau of Labor Statistics uses the average of the years 1947- 

1949 as the base year for its index numbers of approximately 2,000 
wholesale prices. Wholesale prices during these three years are there- 
fore represented by 100. The wholesale price index number for June 

1950 was 100.2; for December 1952, it was 109.6; for January 1953, it 
was 109.9; for February 1953, it fell to 109.6. Prices for these months 


® See Chapters 17 and 18 for a- more complete discussion of index numbers. 
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are thus expressed in terms of the average for the thirt 3 ^-six months of- 
1947-1949. 

Sex ratio. The relationship of the number of males to the number of 
females in^the population is given by the sex ratio, which states the num- 
ber of males per 100 females. In 1950 there were 74,833,239 males and 
75,864,122 females in the United States. There were thus 98.6 males 
per 100 females in this country. The ratio varied for the different states. 
It was lowest in Massachusetts, where there were 93.8 males per 100 
females, and highest in Wyoming, where there were 114.1 males per 100 
females. The various nativity groups in the population showed differ- 
ent sex ratios. Negroes had 94.3 males per 100 females; native Whites, 
98.6 males per 100 females; foreign-born Whites, 103.8 males per iOO 
females; Japanese, 117.7 males per 100 females; and Chinese, 189.6 males 
per 100 females.” 

Population density. Instead of merely comparing the total popu- 
lation of two communities, it may often be more meaningful to consider 
the density of the population. We do this by dividing the total popula- 
tion by the area in square miles, and thus determine the number of per- 
sons per square mile. For example, in 1950 the population of Montana 
was 591,024 and the population of New Hampshire was 533,242. If we 
relate these figures to the land area of each state, we find that New 
Hampshire had 59.1 persons per square mile, while Montana had but 
4.1 persons per square mile. These figures do not, of course, mean that 
there were 59 or 60 persons on every square mile in New Hampshire and 
4 or 5 persons on every square mile in Montana. They are merely sum- 
mary figures indicating that, on the average, there 'were the indicated 
number of persons per square mile in each state. 

Population density may also be used in making chronological compari- 
sons. As our country has grown older, the population density has 
increased. In 1800 there were 6.1 persons per square mile in the United 
States; in 1950 there were 50.7 persons per square mile. 

Ratios per capita. Many figures are more meaningful or more useful 
when expressed on a per capita basis. The Federal debt of the United 
States reflects not only the level of expenditures in past years and 
increases in government services, but also the growth of population. 
For example, on June 30, 1940, the Federal debt was $48,497,000,000; 
by June 30, 1952, the figure had grown to $259,151,000,000. If these 
figures are divided by the population at the two periods, it appears that 
the per capita Federal debt was 1367 on June 30, 1940; and $1,650 on 
June 30, 1952. , 

The consumption of various commodities is frequently stated on a per 
capita basis. Thus in 1952 the 'estimated consumption of beef was 61.0 
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pounds per capita; the estimated consumption of eggs was 409 per capita; 
the approximate amount of refined sugar consumed was 95.9 pounds per 
capita. 

Death rates. The crude, gross, or general death rate for a given year 
is obtained by dividing the number of deaths occurring in a community 
during that year by the mid-year population of that community, and 
expressing the result per thousand. In 1952 there were in the United 
States an estimated 1,494,000 deaths from all causes. The July 1, 1952, 
population, resident in the United States, was estimated to be 155,- 
767,000. The death rate for 1952 was therefore 

1,494,000 155,767,000 = 0.0096, or 9.6 per thousand. 

It will be seen that the accuracy of a death rate depends first upon the- 
degree of completeness of the registrations of deaths, and seljond upon the 
accuracy of the mid-year population estimate used as the base. Since 
population counts are made only once in 10 years, most of the population 
figures used must be estimates. When the population is estimated for a 
year falling between two censuses, the estimate is termed an inter-cenml ^ 
estimate; when the estimate is for a year after a census, it is termed 
a poat-censal estimate. Inter-censai estimates are naturally somewhat 
more accurate than post-censal estimates. For the years 1951 to 1959 
inclusive, death rates must at present be based upon post-censal esti- 
mates and are called preliminary rates. After the 1960 census results 
are available, inter-censai estimates may be made for the years 1951- 
1959, and the death rates may be recomputed upon the basis of these 
new population estimates. Such rates are called revised rates. 

When the deaths occurring in a state or city are diwded by the popula- 
tion of that community, the resulting crude death rate is subject to certain 
corrections. For example, in any given 3 ?'ear people may die in a commu- 
nity who are residents elsewhere, and also some residents of any large 
community may die outside of that community. If the non-resident 
deaths are deducted from those which occurred in the community, the 
resulting rate is referred to as a local rate. If, in addition, the deaths of 
residents occurring outside of that community are added, the resulting 
rate is referred to as a resident rate. Failure to recognize these important, 
differences may lead to drawing false conclusions. In February 1935 it 
was announced that the death rate for Queens borough of New York City 
was 6.5 per 1,000, for Bronx 7.8, for Brooklyn 9.3, for Richmond 13.5, and 
for Manhattan 16.3. The death rate for Queens was lower than for any 
other such community in the United States, and at least one newspaper 
promptly announced that Queens was 'Hhe healthiest place in the eoun- 
try.'^' It was very quickly pointed out, however, that Queens possessed 
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a very low quota of hospitals and that, therefore, some residents of Queens 
in need of hospital care would seek it in Manhattan or elsewhere. Hospi- 
tal cases naturally show a very high death rate, and a crude death rate 
would not reflect the fact that some persons dying in Manhattan and else- 
where were really residents of Queens. 

Death rates for particular classes of the population (males and females, 
various age groups, and other categories) and for particular diseases or 
causes are referred to as specific death rates. Because the deaths from 
any one cause are relatively few, cause-specific rates are usually stated per 
lt)0,000 of the population. Thus in 1951 the death rate for rheumatic 
fever was 1.1 per 100,000. 

An intelligent comparison of the death rates of different communities 
involves consideration of the fact that the proportions of the sexes may 
differ and also that there may be differences in the age distributions, in 
the racial and nativity composition of the inhabitants, in occupations, 
and in other factors. A discussion of these differences and the methods 
of computing adjusted and standardized death rates is too specialized a 
topic to be treated in this text.® 

Birth rates. Birth rates are usually calculated by dividing the births 
during a year by the mid-year population for that year. Just as in the 
case of death rates, we may have preliminary rates and revised rates. We 
may also have gross, local, and resident rates. Stillbirths are not counted 
as births, although they have been so counted in the past; this fact should 
be remembered in making chronological comparisons. Perhaps it is also 
worth while calling attention to the fact that the registration of births is 
not so complete as is the registration of deaths. A death must be regis- 
tered before a burial permit may be issued and before interment may be 
made. A newborn infant, however, may be absorbed into the family 
and the community whether or not his birth is registered. 

The calculation of birth rates in relation to the total population is not 
thoroughly satisfactory, since the proportion of child producers'^ in the 
population is not constant either from time to time or from place to place. 
Refinements in* the calculation of birth rates are beyond the scope of this 
volume.^ 

Crop yields per acre. Data of the total amount of a crop produced 
may tell us whether or not there is more of that commodity available in 

« See F, E. Linder and R. B. Grove, Vital Statistics Rates in the United States^ 190Ch 
1940 ^ Federal Security Agency, Public Health Service, National Office of Vital Sta-- 
tistics, 1947 and Vital Statistics of the United States^ issued annually by the same 
office. 

^The references given in footnote 6 discuss birth rates in more detail and also 
describe various other vital rates and ratios, such as morbidity rates, ease-fatality 
ratios, marriage rates, divorce rates, fertility rates, stillbirth ratios, and others. 
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one year than in another. From such figures, however, we cannot know 
if an increase may have been due to a more abundant yield, to an increase 
in acreage, or to both. In 1951 there were 980,810,000 bushel* of wheat 
harvested from 61,492,000 acres in the United States; in the follo\Ving 
year, 70,585,000 acres yielded 1,291,447,000 bushels. Both the acreage 
harvested and the total yield had risen, resulting in an increase in the 
yield per acre, which wa>s 16.0 bushels in 1951 and 18.3 bushels in 1952. 
On a geographical basis, the United States, wEich produces more wheat 
than any other country for which figures are available, is not iirstdii 
yield per acre. Canada, producing a little more than half as much wheat 
in 1952 as did the United States, had a yield per acre of 26.5 bushels; and 
Western Germany, which produced about one- tenth as much wheat as 
did the United States, showed 41.2 bushels per acre in 1952. 

Hog-com ratio. The hog-com ratio is the result of dividing the 
average price per 100 pounds which farmers receive for hogs by the aver- 
age price per bushel which farmers receive for corn. For example, if, as 
on January 15, 1953, farmei's are receiving $17.80 per 100 pounds for 
hogs and $1.48 per bushel for corn, the ratio is $17.80 $1.48 = 12.0. 

This ratio may be interpreted to mean that 100 pounds of hogs are 12.0 
times as valuable as a bushel of corn or, more simply, that 12.0 bushels 
of corn are equal in value to 100 pounds of hogs. On April 15, 1952 hogs 
brought $16.40 per 100 pounds and corn yielded the farmer $1.68 per 
bushel. At that time the ratio was 9.8. Over the 6-year period 1947- 
1952, the hog-corn ratio averaged about 13.2, falling as low as 9.2 in 
May 1948 and reaching 19.8 in February 1947. When the ratio is low, 
it is more profitable for farmers to sell their corn outright than to feed 
the corn to hogs being fattened for market. When the ratio is high, it 
becomes more profitable for the farmer to feed corn to his hogs than to 
sell the corn outright. Since corn is the principal element of cost in pro- 
ducing hogs for market, the ratio is used as an indicator of the desirabil- 
ity of future expansion or contraction of hog production. There is thus 
a relationship between the hog-corn ratio and the hog production cycle. 
When the ratio is high, an increase in hog production tends to follow. 
Such an increase is frequently followed by a decline in hog prices in rela- 
tion to com prices, and there then follows a tendency to restrict hog pro- 
duction. Curves showing hog-corn ratios, by months, for 1948-1952 are 
shown in Charts 5.14 and 5.15. 

Batting averages. The familiar batting average of the sport pages of 
the daily paper is a ratio of the hits made by a batter in relation to the 
total number of times he was at bat. Table 7.4 shows a series of selected 
batting averages. The figures in the last column of Table 7,4 may be 
correctly thought of as either ratios to one or as averages of a series of 
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observations each, having a value of 1 or 0 (that is, either the batter did 
or did not make a hit). If a man has been at bat 75 times and has made 
25 hits, Ms batting average would be shown as .333 and is spoken of as 
three hundred and thirty-three.^' If he had made a hit every time he 
was at bat, his figure would be 1.000, which is referred to as thou- 
sand," Notice that certain contradictions are involved in some of the 
terms used to refer to these data. The column of figures is freqiieiitiy 
headed ^^percentage"; the figures are printed as ratios to one; the figures 
ar© spoken of as per thousand! 


TABLE 7.4 


individual Batting Averages of 16 Outstanding American League Players^ 
^ 1952 


Player and club 

Games 

Times 
at bat 

Hits 

Batting 

average’*' 

Fain, Ferris E., Philadelphia 

145 

538 

176 

.327 

Mitchell, L. Dale, Cleveland 

134 

511 

165 

.323 

Mantle, Mickey C., New York 

142 

549 

171 

.311 

Kell, George C., Detroit-Boston 

114 

428 

133 

.311 

Woodling, Eugene E., New York 

122 

408 

126 

.309 

Goodman, William D., Boston 

138 

513 

157 

.306 

Rosen, Albert L*, Cleveland 

148 

567 

171 

.302 

Avila, Roberto, Cleveland 

150 

597 

179 

.300 

Fox, J. Nelson, Chicago 

1 152 

648 

192 

.296 

Robinson, W. Edward, Chicago 

155 

594 

176 

.296 

Di Maggio, Domime P., Boston 

128 

486 

143 

.294 

Bauer, Henry A., New York 

141 

553 

162 

.293 

Nieman, Robert C., St. Louis 

131 

478 

138 

.289 

Courtney, Clinton D., St. lA>uis 

119 

413 

118 

.286 

Runnels, James E,, Washington 

152 

555 

158 

.285 

Groth, John T., Detroit 

141 

524 

149 

.284 


* This column is headed “POT" in the original table. 

Data from American league of Professional Baseball Clubs, press release for December 14, 19l>2. 


Airline accident ratios. The safety of air travel may be indicated by 
means of ratios. The number of plane-miles flown during a year may be 
divided by the. number of accidents to obtain ^'plane-miles flown per 
accident." In 1952 scheduled domestic air-carrier operators flew 447,- 
158,490 plane-miles and 36 accidents occurred* The lines therefore flew 
12,421,069 plane-miles per accident. In the same year, there were 5 acci- 
dents involving a fatality, and dividing the plane mileage flown by 5 gives 
89,431,698 plane-miles per fatal accident. During 1952 there were 46 
passenger fatalities as a result of airplane accidents on scheduled domes- 
tic airlines, and it appears that these lines flew 9,720,837 plane-miles per 
passenger fatality. Passenger fatalities may be related to passenger- 
miles, and since scheduled domestic airlines flew 12,996,671,000 passen- 
ger-miles in 1952, we have 1^,996,671,000 -r 46 « 282,636,326 passen- 
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ger-miles flown per passenger fatality. Because of the small number of 
accidents and fatalities involved, these ratios fluctuate tremendously from 
year to year. For example, the passenger-miles flown per passsenger fatal- 
ity were 80,910,867 in 1946; 31,725,186 in 1947; 75,249,940 in* 1948; 
76,032,710 in 1949; 87,118,531 in 1950; 79,111,993 in 1951; and 282,- 
536,326 in 1952. It will be observed that, as air travel becomes safer, 
all of the ratios mentioned will grow larger. Ratios of the number of 
fatal accidents per million plane-miles and of the number of passenger 
fatalities per 100 million passenger-miles may also be computed. Such 
ratios would be reciprocals of those given and, as air travel becomes'safer, 
would become smaller. 

The 100 per cent statement. When banks, insurance companies, 
and other corporations present financial information to the public, they 

TABLE 7.5 

Assets of the Provident Mutual Life Insurance Company ^ December 31 ^ 19S1 

and December 32, 1952 


Asset 

Amount 

Per cent 
of total 


1951 

1952 

1951 

1952 

U. S. Government Bonds 

Canadian Government and Provincial 

$117,789,000; 

1 

$102,768,000 

i 

17.5 

14.7 

Bonds 

18,984,000 

9,401,000: 

2.8 

1.3 

State and Municipal Bonds 

1,861,000 

3,664,000 

.3 

.5 

Public Utility Bonds 

162,207,000 

169,425,0001 

24.1 

24.3 

Railroad Bonds 

! 42,520,000 

1 98,643,000 

37,031,000; 
; 123,168,0001 

6.3 

5.3 

Industrial Bonds 

14.7 

17.6 

Preferred and Guaranteed Stocks. . .... 

19,337,000 

I 19,979,000 

2.9 

2.9 

Common Stocks 

; 13.700,000 

t 14,657,000 

2.0 

2.1 

Mortgage Loans 

; 151.076,000 

1 170,748,000 

22.4 

24.5 

Real Estate Held for Investment 

' 2,821,000 

2,740,000 

.4 

.4 

Home Office Property 

2,000,000 

1,900,000 

.3 

.3 

Other Real Estate 

2,786,000 

2,093,000 

.4 

.3 

Loans on Policies of the Company 

23,230,000 

23,657,000 

3.5 

3.4 

Cash 

5,312,000 

11,073,000 

5,296,000 

ll,559,000i 

.8 

1 .8 

Other Assets 

1.6 

I 1.6 

Total 

$673,339,000 

$698,086,000i 

ioo.o 

100.0 


Data from Provident Mutual Life Insurance Company of Fhiladelpiiia, EigMy'-eighth Annwl Eepmt, 
195t, p. 6. Several percentage® above were adjusted by tbe company to make the total of eaeb column 
of percentages equal 100.0. 


find it effective to supplement the dollar figures with percentages. Thus, 
a financial statement may show each asset as a percentage of all assets, 
and each liability as a percentage of all liabilities. The procedure is par- 
ticularly effective when the dollar figures are large. Table 7.5 shows the 
assets of the Provident Mutual Life Insurance Company as set forth in 
an annual report. The actual figures, even though rounded, are too 
large for the ordinary reader to grasp and compare, but the percentage 
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data make comparisons less difficult. In preparing such a percentage 
statement, it is desirable not to show too many decimal places, else com- 
parisons cannot readily be made. A statement of the resources of a 
bank "carried all percentages to three decimal places. This was quite 
unnecessary, particularly since the smallest item, ''sundry securities,'' 
was 0.035 (0.0349) per cent and could have been shown as 0.03 per cent, 
and since the second smallest item, "other assets," was 0.039 per cent 
and could have been shown as 0.04 per cent. For popular presentation, 
there is some advantage in lumping such small items together in order 
to center attejition upon the more important ones. These two small 
items, if combined, would have appeared as 0.07 per cent, or as 0.1 per 
cent with ail percentages shown to but one decimal place. However, it 
may have been desired to emphasize the smallness of either "sundry 
securities" or "other assets," or both. 

Railroad ratios. The efficient operation of railroads necessitates the 
collection and use of a vast amount of statistical data in connection with 
which numerous ratios are calculated. The figures which follow are for 
Class I railroads for 1952. 

The investment per mile of line is obtained by dividing total invest- 
ment in road and equipment (including cash, materials, and supplies) by 
the number of miles of railroad line. This figure was $149,820 per mile, 
or, allowing for accrued depx'eciation, $118,072 per mile. 

Freight revenue per ton-mile is obtained by dividing total freight reve- 
nue by the total number of ton-miles of freight hauled. The freight 
revenue per ton-mile was 1.430 cents. Similarly, we may compute the 
passenger revenue per passenger mile, which amounted to 2.663 cents. 

The operating ratio is the ratio of operating expenses to operating 
revenues. Operating expenses were $8,053,003,585, while operating 
revenues were $10,581,418,145. The operating ratio was 76.11 per cent. 

There are a number of other railroad ratios; the meaning of each is 
rather obvious. Enumerating a few: the gross revenue per ton of freight 
was $6.75; the haul per ton of freight was 427 miles; the revenue per pas- 
senger was $1.93; the avei'age trip per passenger was 72.5 miles; the rate 
of return on aggregate property investment was 4.11 per cent; the hours 
worked duriug the year per railroad employee were 2,320; the percentage 
of unserviceable freight cars averaged 4,9 during the year; the ton-miles 
per day per freight car were 973; the mileage per day per freight car was 
46,2 miles, ^ 

The railroad ratios mentioned above are one type of business ratios. 

® For these and other railroad ratios, see A Yearbook of Railroad Information^ issued 
annually by the Committee on Public ftelations of the Eastern Railroads, New York. 
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Many sorts of business organizations compute diverse ratios for the better 
functioning of the enterprise. Discussed in another volume^ are such 
ratios as current ratio (current assets current liabilities), jnerchandise 
turnover (net sales -r- merchandise iiwentory), margin of profit (picofit 
sales), and labor turnover (replacements number on payroll). 

FAULTY USE OF PERCENTAGES 

Ratios and percentages are in such general use that it is not surprising 
to find them occasionally misused. Difficulties encountered in the (calcu- 
lation and use of percentages can generally be traced to one of* the follow- 
ing causes: (1) confusion in regard to the base, (2) calculation of percent- 
ages based on small absolute numbers, (3) misplaced decnmal points, (4) 
arithmetic mistakes, (5) improper procedure in SbYeroging percentages. 
These will be discussed in order. 

Confusion in regard to base. Over a period of five years, from 1916 
to 1921, the enrollment in veterinary colleges in the United States 
declined from 3,160 to 641 students. The decrease was 2,519 students, 
or 79.7 per cent of the 1916 enrollment, yet the dean of a midwestern 
veterinary college was quoted as having said that from 1916 to 1921 the 
enrollment had decreased oOO per cent! The dean may have actually 
said that the 1916 registration figure was about 500 per cent of the 1921 
figure. A decrease of 500 per cent would mean a negative enrollment 
four times the size of the 1916 registration. 

In the autumn of 1920 a determined effort was made by the United 
States district attorney to have restaurants in Pittsburgh lower their 
prices to a pre-w'ar level. Newspapers announcing the success of the 
drive stated that Pittsburgh restaurants had cut their prices 50 to 100 per 
cent. It is, of course, clear that prices cannot be cut lOO per cent, else 
the servings formerly sold would be given awa^d The price reductions 
on a number of dishes were stated; the greatest reduction took place in 
the price of doughnuts and pie. These had formerly sold at 15 cents per 
order. Identical-size servings were sold at 5 cents after the redrwtion ; 
hence, the reduction amounted to 66.7 per cent of the former selling price. 

It is not at all unusual to sec an advertisement claiming '^prices reduced 
100 per cent.^’ Of course, this should mean that goods are being given 
away. One company even went so far as to advise that their catalog 
would enable one to ‘^save from 60 to 200 per cent.^^ 

The most serious confusion in regard to a base seems to be present in 
a mail-order house guarantee of tires. The concern claims that the guar- 


® See F, E. Croxton and D. J. Cowden, Prmiiad Business Siatisiics, second edition, 
Prentice-Hail, Inc., New York, 1949, pp. 
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antee is without limit as to mileage, months or years of service/' and 
that tires will be repaired free or replaced at a charge “only for the 
actual amoi|nt of mileage you have received." Literally, the base is 
infinity, and, if the guarantee were to be fully carried out for all tire 
buyers, the company would quickly have to cease selling tires. In fair- 
ness to the concern involved, it should be noted that their adjustment 
policy is a generous one. 

Percentages from small numbers. An almost classic illustration of 
the undesirability of using percentages based upon small numbers is 
given Ly Chaddock.^^ 

A short time after Johns Hopkins University had opened certain 
courses in the University to women, it was reported that 33-| per cent 
of the women students had married into the faculty of the institution. 

Of course the important information was the number of women students. 
There were only three. When dealing with a small number of cases, the 
use of percentages alone leads to wrong impressions. In these cases 
either percentages should not be used at all or the numbers upon which 
they are based should accompany the percentages. 

Ordinarily, percentages should not be computed unless the base con- 
sists of 100 or more cases. 

Misplaced decimal points. Mistakes involving misplaced decimal 
points may lead to gross misinterpretations. They are a common sort of 
mistake and should be guarded against. Sir Josiah Stamp gives a rather 
unusual illustration: 

A periodical return of revenue received into the Exchequer was laid 
before Lord Randolph, and his private secretary, Mr. George Gleadowe 
of the Treasury, was looking over his shoulder, and Lord Randolph 
expressed satisfaction at the fact that the Customs revenue had increased 
by 34 per cent, as compared with the corresponding period in the pre- 
ceding year. Mr. Gleadowe pointed out to him that it was only .34 per 
cent. “What difference does that make?" asked Lord Randolph. 
When it was explained to him he said, “I have often seen those damned 
little dots before, but I never knew until now what they meant." 

Misplaced decimal places involve mistakes of such a rudimentary 
nature that the reader may feel they are too elementary to be mentioned 
here. However, a research report from a state university stated that 
during a year the military forces of the United States had consumed 
8.7 per cent of the coffee available during that year. The figures from 

Robert E. Chaddook, Principles and Methods of Statistics, Houghton Mifflin Co., 
Boston, 1025, pp. 13-14. 

Sir Josiah Stamp, Some Economic Factors in Modem Life, p. 265. P, S. King 
and Son, London. 1929. 
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ivhich the percentage was computed were 24 and 2,756 millions of pounds. 
The correct figure is 0.87 of one per cent. 

A feature writer for a metropolitan newspaper, discussing the Navaho 
Indians, said, ^^The known Navaho death rate is 360 per 100,000.’^ 
Stated in the usual fashion, this would be 3.6 per 1,000 or, roughly, one- 
third of the rate for the United’ States, which was 10.6 during the same 
year. Although the basic data from which the Navaho death rate was 
computed were of dubious value, it is known that the figure is much 
larger than that for the entire country. The feature writer not only mis- 
placed a decimal (he had intended to say 3,600 per 100,000, which i§ 36 
per 1,000), but may have made an arithmetic mistake as well. 

It is of interest to note that a misplaced decimal always involves a 
serious misstatement, since the least mistake that can occur results in 
the incorrect figure being 10 times as large as it should be or one-tenth as 
large as it should be. 

Computers seem most likely to misplace decimals (1) when large 
absolute numbers are involved or (2) when one of the absolute numbers 
is very large (or small) in relation to the other, resulting in a very large 
(or small) ratio, Two illustrations will suflSce. 

Over a period of years, the resources of a bank grew from $100,000 to 
$300,000,000. A newspaper stated that the growth was 3,000 per cent. 
Actually, the second figure is 3fi00 times the first figure, or SOOfiOO per 
cent of it, and the growth was *299,900 per cent. 

An advertisement pointed out that more than 200,000,000 checks a day 
are paid in the United States, and that about 99.9995 per cent of them 
are good. Said the advertisement, ^'Oniy 1 out of 2,000 is dishonored/' 
The percentage and the ratio are in disagreement. Correspondence 
revealed that about 1,000 checks par day were bad, so that the ratio 
should have been 1 out of 200,000." 

Arithmetic mistakes. Early in 1953 a prominent government 
official stated, according to newspapers, that Russian Communists 
dominated 800,000,000 persons, and compared this figure with the United 
States population of about 150,000,000, The ratio, he is alleged to hme 
said, was 7 to L The correct ratio is 5.33 to 1. 

Improper averaging of percentages. The occasional necessity for 
averaging percentages calls for mention of a pitfall and for consideration 
of the proper procedure. Consider the figures of Table 3J : It is desired to 
know the average proportion of White persons who were foreign born for 
the New England division. If we add the six percentages and divide by 
six, we have 72.1 -5- 6 == 12.0 per cent. This figure, however, does not 
correctly represent the situation; the six percentages were calculated from 
differential bases and therefore should be weighted accordingly. The 
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easiest procedure for obtaining the correct percentage consists of totaling 
the White population for the six states (9,161,156 persons), totaling the 
foreign-born White population (1,286,051 persons), and dividing the 
second figure by the first. The result is 14.0 per cent, which is the pro- 
portion of foreign-born White persons in the New England division. The 
same result could also be obtained by averaging the six percentage figures, 
provided each is weighted according to the base from which it has been 
calculated. This procedure of multiplying each percentage by its base, 
summing the results, and dividing by the sum of the base figures (or 
weights) is essentially the same as the method just used. The result, 
however, is a little less accurate, since each percentage figure has been 
rounded. The error involved in rounding a given percentage is magnified 
when the percentage is multiplied. But since some percentages are 
understated and some are overstated, there is a tendency for these errors 
to counterbalance. Under certain conditions, it may be appropriate 
to average percentages without weighting them according to their bases. 
This is discussed on pages 183-1 84. 



CHAPTER 8 


The Frequency Distribution 


One method of organizing and summarizing statistical data consists in 
the formation of a frequency distribution. In this device the various 
items of a series are classified into groups and the number of items falling 
into each group is stated. A frequency distribution is shown in Table 
8.3. Sometimes the user of statistics will find frequency distributions 
already constructed in the publications to which he may refer; sometimes 
he will construct his own frequency distribution from unclassified data. 
We shall begin our discussion of the frequency distribution by first con- 
sidering the appearance of the raw or unclassified data. 

RAW DATA 

The unclassified data from which a frequency distribution might be 
made may appear as do the data of Table 8.1. Here we have the grades 
received for the four-year course by the 225 cadet-midshipmen of the 1962 
graduating class of the United States Merchant Marine Academy. The 
arrangement of the grades is according to the alphabetical order^ of the 
cadet-midshipmen^s names, though we have omitted the names in order 
to save space. Another illustration of raw data, from which a frequency 
distribution might be constructed, is the payroll of a factory. The 
employees on the payroll may be listed alphabetically by name; by 
employee number; by departments, and then by name or number; by 
seniority; or in some other convenient order. Considering the grades of 
the cadet-midshipmen as shown in Table 8.1, it is apparent that very 
little information is forthcoming unless the figures are rearranged. When 
the data are listed as in Table 8,1, it is a tedious task to find even the 
lowest grade and the highest grade. It is even more difficult to ascertain 
around what value the grades tend to concentrate, or if, indeed, they do 


^ A slight rearrangement was made in Table 8.1 so that identification of the grade 
of any particular cadet-midshipman is impossible? 
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show such a concentration. These and other steps in analysis are facili- 
tated by rearranging and summarizing the data. 


TxlBLE 8.1 


Grades Received for the Four-Year Course by 225 Cadet -Midship men of the 
1952 Graduating Class of ike United States Merchant Marine Acailemy 


80.6 

77.6 

78.2 

80 9 

72 1 

72.9 

79 8 

75 5 

77.2 

78 3 

76 5 

82. 6 

83.7 

74.2 

78.1 

85 8 

76 6 

77.4 

76.2 

75 4 

S 2.2 

84 6 

75 0 

81.4 

85 0 

, 76.8 

88 9 

yG .5 

79.2 

85.6 

76.2 

79.1 

75.3 

78.4 

79 0 

76.6 

7 , 0.7 

88 2 

82.3 

84.5 

80.4 

78.6 

83 1 

75.2 

79.8 

so 8 

78.5 

S 3 0 

84.2 

72.9 

74.6 

81.1 

78.0 

79.1 

81.9 

75 0 

77.7 

78.7 

78 5 

77 4 

77.1 

80 2 

78.0 

72.9 

79.6 

S 9.0 

79.5 

78.6 

79.3 

84.7 

82.6 

75:9 

77.5 

83 9 

77 5 

75.6 

78 4 

70.5 

81.5 

86 8 

76.6 

80.0 

76.1 

84 5 

75 6 

SO . 3 

78.7 

79.6 

80.3 

85.9 

76.8 

87.8 

78 1 

75.2 

78.7 

80.0 

75.3 

75.7 

83.9 

73.7 

79.3 

76. 7 

. 82,4 

76.8 

79.2 

80.6 

80.8 

85.2 

76.4 

77 6 

78 0 

80.5 

83.9 

77.4 

77.5 

84.0 

84.5 

79.8 

78.1 

85.2 

78.3 

82.1 

77.8 

i 78.3 

74.0 

85.6 

82.5 

77.7 

82.1 

80.8 

80.6 

i 77,7 

; 79.7 

86.7 

82.5 

82.3 

74.9 

77.6 

SO 4 

83.2 

76.4 

87.4 

78.3 

83.5 

81.6 

81.3 

73 8 

75.3 

78.8 

Sl.l 

77.7 

80.5 

86.0 

81.2 

80.4 

76.1 

75.1 

84.6 

78.5 

74.2 

75.6 

^ 78.8 

79.0 

78.3 

82.4 

76.0 

77.6 

85.0 

81.7 

86.5 

81.2 

81.0 

78,0 

80.7 

84.5 

83.5 

78.5 

87.7 

' 80.2 

79.2 

77.6 

83.4 

74.6 

75.8 

75.8 

80.2 

79 i 

77.6 

80.2 

86.3 

72.7 

78.0 

75.8 

78.3 

84.8 

79.5 

87.1 

79.7 

85.4 

76.8 

83.6 

79.2 

89.6 

83.7 

79,7 

77 a 

76.5 

78.0 

77.0 

75.6 

77.3 

78.0 

84.3 

75.0 

75.0 

78.5 

74.9 

82,8 

74.9 

76.9 

85.6 

77.8 

76.9 

75 9 

81.6 


Data from United States Merchant Marine Academy. For the purposes of our illustration, the 
grades, originally given to two decimals, were rounded to one decimal. 


THE ARRAY" 

In Table 8.2, the cadet-midshipmen's grades have been rearranged in 
descending order. Such an arrangement (whether ascending or descend- 
ing) is called an array. It arranges the items in order of magnitude. We 
have not summarized; that will be done when we construct the frequency- 
distribution. A consideration of the array puts ,us in a position to learn 
something from the data. First, the array enables us to see at once the 
range of the grades, which varied from 72.1 to 89.6. Second, it may also 
he observed that there is a concentration of grades between 78 and 
80. This -will be more clearly seen when we examine the frequency dis- 
tribution and consider measures of central tendency. Third, a somewhat 
more extended examination gives us a rough idea of the distribution of 
the grades. We may observe, for example, that there are few grades 
below 74 or above 87. This jmrticular feature of the series will be much 
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more' readily studied when we have the frequency distribution. Fourth, 
it may be noticed that the figures show a fair degree of continuity. If 
the grades are expressed as whole percentages, all consecutive value§ from 
72 to 90 are represented. If we consider the figures as shown, to one. 
decimal place, we may observe that within the range of 75.0 to 85.0 
inclusive, which includes 189 of the 225 cadet-midshipmen, 86 of the 

TABLE 8.2 


/irray of Grades Received for the Four-Year Course by22S Cadet-Midshipmen 
of the 1952 Graduating Class of the United States Merchant Marine 

Academy 


89.6 

84.6 

82.5 

80.6 

79.5 

78.5 

77.7 

76.6 

75.3 

89.0 

84.6 

82 4 

80.6 

79.5 

78.5 

77.6 

76.6 

75.3 

88.9 

84.5 

82.4 

80.6 

79.5 i 

78.4 

77.6 

76 .^ 

75.3 

88 2 

84,5 

82,4 

80.5 

79.3 

78 4 

77.6 

76.5 

75,2 

87.8 

84.5 

82.3 

80.5 

79.3 

78.3 i 

77.6 

76.5 

75.2 

87.7 

84.5 i 

82.3 

80.4 

79.2 

78.3 ^ 

77.6 

76.4 

75.1 

87.4 

84.3 

82.2 

80.4 

79.2 

78 3 ! 

77.6 

76.4 

75.0 

87.1 

84.2 

82.1 

80.4 

79.2 

78.3 

77.5 

76.2 

75.0 

86.8 

84.0 

81.9 

80.3 

79.2 

78.3 

77.5 

76.2 

! 75.0 

86.7 

83.9 

81.7 

80.3 

79.1 

78 3 

77.5 

* 76.1 

75.0 

86.5 

83.9 

81.6 

80.2 

79.1 

78.2 

77,4 

76.1 

74 9 

86.0 

83.9 

81.6 

80.2 

79.1 

78.1 

77.4 

t 76.0 

1 74.9 

85.9 

83.7 

81.5 

80.2 

79.0 

78.1 

77.4 

75.9 

i 74.9 

85.8 

83.7 

81.4 

80.2 

79.0 

78.1 

77.3 

75.9 

1 74.6 

85.6 

83.6 

81.3 

80.0 

78.8 

78.0 

77.2 

75.8 

' 74.6 

85,6 

83.5 

81.2 

80.0 

1 78.8 

1 78.0 

77.1 

75.8 

! 74.2 

85.6 

83.5 

81.2 

79.8 

' 78 7 

i 78.0 

77.1 

75.8 

i 74.2 

85.4 

83.4 

81,1 

79.8 

78.7 

78.0 

77.0 

75.8 

74.0 

85.3 

83.2 

81,1 

79.8 

1 78.7 

78.0 

76.9 

75.7 

73.8 

85.2 

83.1 

81.0 

79.7 

1 78.6 

78.0 

76.9 

75.6 

73.7 

85.2 ‘ 

83.0 

80.9 

79.7 

78.6 

77.8 

76.8 

75.6 

72.9 

85.0 

82 8 

80.8 

79.7 

78.5 

77.8 i 

76.8 

75.6 

72.9 

85.0 

82.6 

80.8 ' 

79.7 

78.5 

77.7 ^ 

76.8 ^ 

75.6 

72.9 

84.8 

82.6 

80.8 

79.6 

78.5 

77.7 

76.7 

75.5 i 

72.7 

84.7 

82.5 

80.7 

79,6 I 

78.5 j 

77.7 

76.6 

75.4 1 

72.1 


possible 101 values are to be found. If the grades had been for a largei 
number of students, this tendency would have been more marked. 

The array, however, is a cumbersome form of the data. Furthermore, 
it is troublesome to construct, because of the necessity of rearranging all 
the items. One fairly satisfactory method of constructing an array con- 
sists of recording the figures on small cards and sorting the cards. Of 
course, if the data are punched on mechanical tabulating cards, the con- 
Btruction of an array is simple. 

When studying grades, we may frequently want to make an array* 
Some institutions publish each year a roll of the graduating class, listing 
the names and standings of the students in order from highest to lowest. 



156 


[Chap. 8 


THE FREQUENCY DISTRIBUTION 

If we are interested in a campaign to raise funds for a hospital or" com- 
munity chest, it might be very useful (for publicity purposes, for example) 
to list the individual gifts in descending order. It is obvious, however, 
-that such a listing of 500 or 1,000 contributions would be cumbersome and 
of limited value. In many instances there is no particular advantage in 
making an array. It would be a waste of time for a concern to make an 
array of the amounts paid to its employees each month. There is not 
much reason why a bank should make an array of the daily balances of 
its many depositors. On the other hand, a student of vital statistics might 
,find it very valuable in a study of birth rates to array the various cities in 
ascending or descending order and consider the reasons for the differences. 

THE FREQUENCY DISTRIBUTION 

The array of Table 8.2 rearranged the midshipmen’s grades. The fre- 
quency distribution of Table 8.3 summarizes the grades into 9 groups or 

TABLE 8.3 

Frequency Distribution of Grades Received 
for the Four^Year Course by 225 Cadets , 

Midshipmen of the 2952 Graduate 
ing Class of the United States 
Merchant Marine Academy 


Grade 

Number of cadet- 
midshipmen 

72.0-73.9 

‘ 7 

74.0-75.9 

31 

76.0-77.9 

42 

78.0-79.9 

64 

80.0-81.9 

33 

82.0-83.9 

1 24 

84.0-86.9 

22 

86.0-87.9 


88.0-89.9 

I 4 

Total 

i 225 


classes. It is obvious that the frequency distribution does not show the 
details given in the array, but much is gained by the summarization. We 
can see that the lowest grade is not below 72 and that the highest grade is 
not quite 90; we cannot ascertain the exact values of the highest and low^ 
est grades as we did from the array. The concentration of grades in the 
neighborhood of 78-80 is apparent at a glance. If we draw a curve of 
the frequency distribution, as in Chart 8.1, we can visualize the data 
readily and we may make comparisons with other series, as discussed in 
a later section of this chapter. Having classified the data, we are in a 
position to make rapid computations of certain values (discussed in the 
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follo\Ving chapters) which will assist us in describing and analyzing the 
data. 

When an array is available, the frequency distribution may be made by 
merely counting the items. It is not advisable, however, to make bi \ 
array solely for the purpose of making the frequency distribution, because 
too great an amount of time is required to construct the array. 

If the data are in unorganized form, as in Table 8.1, we may construct 
a frequency distribution by a scoring device similar to that shown in 

NUMBER OF 
CADET -MIDSHIPMEN 



70 72 74 76 7S 60 82 84 86 88 90 92 


GRADE 

Chart 8.1. Grades Received for the Four- Year Course by 225 Cadet- 
Midshipmen of the 1952 Graduating Class of the United States Merchant 
Marine Academy. Data o£ Table 8.3. 

Chapter 2. Another method of handling the figures consists of making an 
entry form such as that of Table 8.4. This is less laborious than making 
an array and has certain advantages over the scoring procedure. The 
advantages of the entry form are: (1) %ve can scan the columns to see if 
any item is incorrectly entered; (2) we can total the items entered and 
check this total against the total of the unclassified data; (3) if we should 
decide that we w^ant classes of 1 per cent or 3 per cent instead of 2 per 
cent, we can re-form our frequency distribution with little effort; (4) as 
will be shown in the next chapter, the entry form enables us to find out 
how closely the mid-value of a class agrees with the average of the items 
in that class. If desired, the classes used*- in the ^ntry form may be 
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TABLE 8.4 


Entry Form for Grades Received for the Four^-Year Course by 2-25 Cadet- 
Midshipmen of the 1952 Graduating Class of the United States Merchant 

Marine Academy 


72.0-73.9 

74.0-75.9 

76.0-77.9 

78.0-79.9 

GO 

O 

? 

00 

82.0-83.9 

84.0-85.9 

86.0-87.9 

88.0-89.9 

72.9 

75.8 

76.2 

78.3 

80.6 

82.5 

85.3 

87.8 

88.2 

73,7 

74.9 

76.5 

79.7 

80.8 

82.3 

85.4 

87.4 

89.0 

72.7 

75.4 

77.5 

79.8 

81.9 

83.9 

85 6 

87.7 

89.6 

73.8 

75.0 

76.4 

79.0 

80.0 

83.4 

84.5 

86.8 

88,9 

72.1 

74.9 

77.6 

78.0 

81.6 

82,8 

85.2 

86.7 


72.9 

74.6 

77.6 

79.2 

81.2 

82.6 

84.6 

88.5 


72.9 

75 6 
74.9 
75.6 

75.6 

75.2 

75.3 

75.1 
75.8 
75.8 

74 2 
75.0 

75.8 
75.3 

74.6 
75.3 

74.2 

75.0 

75.5 

75.2 

75.7 

74.0 

75.6 

75.0 

75.9 
75.9 

76.5 
76.1 

77.6 

77.7 

76.8 

77.0 

77.7 

77.5 

76.7 

77.6 

76.1 

76.2 

76.0 

77.3 

76.9 

76 8 

77.6 

77.4 

77.4 

77.8 

77.7 

76.4 

77.8 

77.1 

77.5 : 

77.7 

76.9 

76.6 

76.8 

77.6 

77 1 

77.2 

77.4 

76.6 
76.6 

76.5 

78.5 

79.6 

79.3 

78.1 

78.3 
i 78.6 

1 78.2 

78.1 
78.0 

78.0 

78.7 
79.6 

78.3 

79.2 

79.1 

78.5 

78.6 

78.4 

78.7 

78.8 

78.3 

78.0 

78.1 

78.6 

79.3 

79.6 

78.7 

79.2 : 

78.5 

78 5 

79.8 

78.4 

79.6 

78.3 

79.7 
79.1 

79.5 

79.7 

79.0 

78.0 
78.3 

79.8 

79.1 
78.0 

78.8 

79.2 
79.7 

78.5 

81.0 

81.3 

80.4 

80.7 
80.9 

80.5 

80.8 
80.4 

80.4 

1 80.3 

1 80.6 

1 81.4 

1 80.0 

81.1 

80.2 

81.1 

81.5 
80. Q 

81.7 
80.2 
80.3 

80.8 

80.5 
80.2 
81.2 
80.2 

81.6 

82.2 

82.3 

83.0 

82.4 
82.4 

83.6 

83.7 

82.4 

83.9 

82.1 

S3 2 

83.5 
83.7 
83.3 

82.6 

83.9 
82.5 

1 83.5 

84.5 

84.2 

84.5 

84.6 

86.6 
85.0 

84.8 

84.3 

85.8 

85.0 
84.7 

84.0 

85.9 
85.2 

84.5 

86.6 
i 

1 

87.1 

86.0 
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narrower than we think we shall want for the frequency distribution. 
These classes then be readiljj^ combined into wider ones, using what- 
ever interval and whatever class limits seem advisable. 

All the class intervals of the frequency distribution of Table 8.3 ar§ 2 
per cent. Charting and computations are facilitated when the class 
intervals are all the same. Whenever possible, therefore, frequency 
distributions should be constructed with uniform class intervals. This, 
however, is not alwa 3 ^s practicable. Table 8.5 sho%vs a frequency dis- 

TABLE 8.5 

Average Straight-Time Weekly Earnings of 14^817 Female Secretaries 
in Non-Manufacturing Industries in Neic York City^ January 

1952 


Weekly earnings 

Number 

of 

women 

Frequency densities, 
number of women 
per $2.50 of earnings 

$ 32.60 but less than $ 35.00 

2 

2 

35.00 but less than 

37.50 

3 

3 

37.50 but less than 

40.00 

54 

64 

40.00 but less than 

42.50 

110 

110 

42.60 but less than 

45,00 

277 

277 

45 . 00 but less than 

47.50 

427 

427 

47.50 but less than 

50.00 

509 

509 

50.00 but less than 

52.50 

1,079 

1,079 

52,50 but less than 

55.00 

760 

760 

55 . 00 but less than 

57.50 

1,383 

1,383 

57 . 50 but less than 

60.00 

1,066 

1,066 

60 . 00 but less than 

65.00 

2,679 

1,339.5 

65 . 00 but less than 

70.00 

2,180 

1,090 

70 . 00 but less than 

75.00 

1,454 

727 

75 . 00 but less than 

80.00 

1,126 

563 

80 . 00 but less than 

85.00 

613 

306.5 

85,00 but less than 

90.00 

533 

266.5 

90 . 00 but less than 

100.00 

354 

88.6 

100.00 but less than 

110,00 

155 

38.75 

110.00 but less than 

120.00 

26 

6.5 

120.00 or more 


27 

* . « 

Total 


14,817 



Data from United States Bureau of Labor Statistics, Occupational Wage Survey ^ New York, 
New York, January 1962^ page 10. 


tribution which has non“Uniform class intervals. In this instance the 
result is to give more detailed information for the secretaries having lower 
earnings. 

Selecting the number of classes* No hard-and-fast rule can be 
given as to the number of classes into which a frequency distribution 
should be divided. If there are too many classes, many of them will 
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contain only a few frequencies and the distribution may show irregu- 
larities which are not attributable to the behavior of the variable being 
measured. If there are too few classes, so many frequencies will be 
crowded into a class as to cause much information to be lost. The 
number of classes to use depends partly upon the nature of the data (as 
will be noted for meal checks in the next section), and partly upon the 
number of frequencies in the series. The greater the number of fre- 
quencies, the more classes we may have. The regularity with which the 
frequencies are distributed within the range of values under consideration 
is^also a determining factor. The more regular the distribution of the 
frequencies, the more classes we may use, since data having a high degree 
of regularity may be divided into a large number of classes without show- 
ing unwarranted gaps and irregularities in the frequencies. In general, it 
might be said that fewer than 6 or 8 classes should rarely be used, and 
that more than 16 classes would be useful only for working with extensive 
data. For illustrative purposes, 9 classes were used in Table 8.3. When 
the number of classes has been determined,^ the range of values for the 
entire distribution indicates the class interval to be used. 

Selecting class limits. It was pointed out in Chapter 4 that the 
mid-value of each class is used to represent the class. The mid- values of 
the classes are made use of not only when charting the frequency dis- 
tribution, but also in making various computations to be discussed in 
later chapters. If the limits of each class are not clearly indicated, the 
mid- value, which is the average of the upper and lower limits, cannot be 
properly determined. The adequacy of the mid-value assumption wall 
be discussed more fully in Chapter 9. It is important at this point to 
make clear that, when a frequency distribution is being constructed, the 
class limits should be so chosen that the mid- value of each class will coin- 
cide, so far as possible, with any values around which the data tend to be 
concentrated. 

Suppose that measurements are made of the academic standing of a 
large group of college freshmen upon a numerical scale ranging from 0 to 
100. The data could be expected to be graduated fairly smoothly from, 

® Snedecor has suggested that the class interval for a frequency distribution should 
be not larger than one-fourth of the estimated population standard deviation (see 
Chapter 24) of the data. Seo G. W. Snedecor, Statistical Methods^ 4th ed., Collegiate 
Press, Ames, Iowa, 1946, p. 170. 

For our figures the estimated population standard deviation, computed from the 
raw data, is 3.67. Following Snedecor’s rule, the class intervals should be 0.9 or less 
in width, so that we would have 20 or more classes. Note that this rule requires the 
time-consuming computation of the estimated population standard deviation from 
the ungrouped data, and also that it fails to take into consideration the number of 
items involved. 
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sajj 50 to nearly 100. There would be students rating 88.0 and others^ 
89.0; in addition^ there would be still others falling between these two 
values. If a large enough group were to be measured, the minuteness of 
the variations between 88.0 and 89.0 would be limited only by the^ccuraoy 
of the measuring instrument (in this case, the grading system). There 
would not be a series of values around which the frequencies would tend to 
concentrate, and the problem mentioned at the end of the preceding 
paragraph would not arise. 

On the other hand, consider the meal checks of a cafeteria, many (but 
not all) of which are a multiple of 5 cents. In this instance, the class 
intervals should be written 8-12 cents, 13-17 cents, 18-22 cents, and’'so 
forth, thus giving mid-values of 10 cents, 15 cents, 20 cents, and soon, 
which coincide with the concentration points. 

The data of freshmen grades and the ratings of midshipmen are illustra- 
tions of what is termed a continuous variable, since the values are capable 
of infinitely small variations from each other. Heights and weights of 
people are also continuous variables. Length of life is another illustra- 
tion. The data of cafeteria meal checks are illustrative of a discrete or 
discontinuous variable, since the values differ from each other by finite 
amounts — ^in this case, one cent. A discrete variable need not show the 
concentrations which were present in the meal-check data. For example, 
if many workmen are employed at similar tasks and are paid on a piece- 
rate basis (that is, upon the basis of amount produced), it is quite possible 
that there may be individuals receiving $61.21, $61,22, $61.23, and sc 
forth, for a week^s work. Although piece rates might be, and often are, 
in fractions of a cent, the weekly payment must be in terms of whole cents. 

The foregoing suggests an important consideration; namely, that we 
are not so much concerned with the fact that a variable is discrete as 
we are with the fact that the data may be broken and that there are 
inherent gaps and concentrations in the actual data in hand. Such a 
situation often occurs when dealing with salaries. One orgaiii 2 iatioii 
with several hundred employees paid salaries ranging from about $1,200 
to more than $15,000 per year. There was in no sense an evenly gradu- 
ated distribution between these limits. The gaps between adjacent 
values ranged from $10 to $5,000, and there were pronounced concentra- 
tions at various customary salaries such as $2,500, $3,000, $3,600, $4,000, 
14,500, $5,000, and so on. The selection of class limits for a distribution 
of this type presents great difficulty. Often it is not possible to adjust 
the mid-values to coincide with all concentration points. An approxi- 
mate adjustment must then suffice. 

The fact that we may be dealing with a continuous variable does not 
warrant us in selecting class limits blindly. If data are being collected 
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concerning weights of individuals, reported to the nearest pound, persons 
reported as weighing 142 pounds would vary between 141.5 pounds and 
142.5 pounds; as a group, they would average about 142 pounds. Sup- 
pose, Mwever, that weight is reported to the last full pound. In that 
event, persons reported as weighing 142 pounds would vary between 
exactly 142 pounds and just under 143 pounds; as a group, they would 
average about 142.5 pounds. Let us assume that a frequency distribu- 
tion with class interval of 3 pounds is to be formed. If weights have 
been reported to the nearest pound, it is correct to write class intervals 

142-144, 145-147, 148-150,^^ and so on, with mid-values of 143, 146, 
149, and so forth. If, however, weights have been reported to the last 
full pound, the above is incorrect, but it is correct to write 142 and under 
145, 145 and under 148, 148 and under 151,’^ and so on, with mid-values 
of 143.5, 146.6, 149.5, and so forth. 

Sometimes, when dealing with a continuous variable, the classes are 
written so that the limits appear to overlap. For example, the data of 
cadet-midshipmen^s grades could have been classified 72.0-74.0, 74.0 
76.0, 76.0-78.0, and so on. When this is done, frequencies which fall on 
a class limit are divided between the two classes, usually resulting in some 
fractional frequencies in the distribution.^ A frequency distribution 
using these classes may be easily constructed from the array of Table 8.2 
or the entry form of Table 8.4. Overlapping class limits are not often 
used for data of grades. 

Curves of frequency distributions. The graphic representation of a 
frequency distribution was discussed in Ch.apter 4. Although a fre- 
quency distribution may be represented by either a column diagram or a 
curve, it is usual to employ the latter device, (We shall make use of the 
column diagram in Chart 8.5 and in Chapter 23.) One advantage 6f the 
curve is that two or more curves may readily be drawn on the same axes 
for purposes of comparison. In any event, the first step in the analysis 
of a frequency distribution should be the construction of a chart, for it 
will tell us at a glance with which of the following types of distributions 
we are dealing. 

Chart 8.1, showing the graphic appearance of the data of cadet- 
midshipmen’s grades, is not symmetrical, but is slightly skewed to the 
right. (Skewness is discussed in Chapter 10.) Many frequency dis- 
tribution curves encountered in the social sciences are asymmetrical and 
frequently are skewed to the right. Only rarely do we find a curve 
skewed to the left. 


®For an example, see F. E, Croxton, EUmeniary Sfati&iics with AppUrations w 
Medicine^ Frentice-Hall, Inc., Hew York, 1953, pp. 41-42. 
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NUMSER Of 
MALE WORKERS 



HEIGHT IN INCHES 


Chart 8.2. Heights of 99552 Male Industrial Workers. Data from 
A Health Study of Ten Thousand Male Industrial WorkerSf p. 59. United 
States Public Health Service, Public Health Bulletin No. 162. 

NUMBER OF 
INVENTORS 



ACE IN YEARS 


Chart 8.5» Age at Death of 371 American Inventors. Data from Bio-Social 
Characteristics of American. Inventors,** by Sanford Winston, American Sociological 
RevieWj VoL 2, No. 6, pp. 8S7--849, 

Biological and anthropometrieal series (especially those involving linear 
measurements, such as height, rather than two- or three-dimension 
measurements, such as waist circumference or weight) frequently yield 
curves which are roughly symmetrical. Such a series is shown in Chai’t 
8.2, which pictures the height distribution of a large group of male 
industrial workers. 
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A tcurve which is skewed to the left is shown in Chart 8.3, wMch 
depicts the age at death of 371 American inventors. As pointed out in 
Chapter 10, where the amount of skewness in this series is ascertained, 
the skewness may be characteristic of the variable or may be due to the 
fact that nearly one-fifth of the inventors included in the study were bom 
before 1800. 

The curve of Chart 8.4 indicates the length of time during which cars 
were parked in Albuquerque, New Mexico, and shows a great many cars 

THOUSANDS 



HOURS 


Chart 8*4. Parking Time of Motor Vehicles in Albu- 
querque, New Mexico. The data are from the Automotive 
Safety Foundation and are for June, July, and August 1949, 

parked for short periods and generally smaller numbers parked for longer 
lengths of time. Curves having this characteristic “reverse J” shape 
may be encountered occasionally. 

Graphdc representation when the class intervals are unequal. 
For some frequency distributions, it is not feasible to maintain the same 
class interval throughout. The distribution of Table 8.5 has eleven 
classes of $2.50, six classes of $5.00, three classes of $10.00, and one class 
of indeterminate width. It would not have been desirable to have used 
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$2.50 class intervals throughout, since that would have necessitated 35 
classes to cover the range from $32.50 to $120.00. This would be too 
many classes to be useful and would provide a more detailed breakdown 
than needed for the upper ranges of the series. Class intervals of $5.00 
throughout would not have been desirable either, since details concerning 
secretaries having earnings of less than $60.00 per week would have been 
lost. 

To draw a suitable chart of the data of Table 8.5, it is necessary to 
make adjustments for the varying class intervals. The class “$60.00 but 

NUMBER or WOMEN 
PER $2.50 or CARNfNCS 



Chart 8.5. Frequency Densities of Average Straight-Time Weekly Earn- 
ings of 14,817 Female Secretaries in Non-Manufacturing Industries in New 
York City, January 1952. Data from Table 8.5. 

less than $65.00^^ is twice as wide as the classes which precede it. We 
do not know how many of the 2,679 secretaries earned $60-00 but less 
than $62.50 a week and how many earned $62.50 but less than $65.00 a 
week. We can say, however, that on the average there were 1,339.5 
secretaries in each of the two halves of the class “$60.00 but less than 
$65.00.” Adjustments of this sort have been made in the last column 
of Table 8.5, where the frequencies are stated per $2.50 of earnings. 
These are frequency densities. 

The -distribution of secretaries’ earnings may now be plotted in terns 
of the frequency densities, as in Chart 8.5. It is not possible to make an 
estimate of the width of the last class interval in Table 8.5, so no adjust- 
ment of the frequencies of that class has been made. Notice on the 
chart how the reader’s attention was called to the presence of these 27 
secretaries. Alternatively, the data of frequency densities could have 
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been shown by a curve instead of a column diagram, and this was done in 
Chart 4.25. However, the column diagram makes it easier for the 
reader to note the changing class width. The irregularities of Chart 8.5 
do not indicate that too many classes were used. They are due to the 
nature of the basic data, there being concentrations on weekly salaries of 
$50 and $55. 

Graphic comparison of frequency distributions. Table 8.6 
shows two frequency distributions, one giving the straight-time weekly 

NUMBER OP 
WOMEN 



30 35 40 45 50 55 60 


DOLLARS 

Chart 8.6. Average Straight-Time Weekly Earnings of 940 Female Book- 
keeping-Machine Operators, Class B, and of 457 Key-Piinch Operators, in 
Finance, Insurance, and Real Estate Odices in Philadelphia, October 1952, 
Data from Table 8.6. 

earnings of 940 class B bookkeeping-machine operators, the other present- 
ing the straight-time weekly earnings of 457 key-punch operators. Both 
series are for females only. If the two distributions dealt with approxi- 
mately the same number of women, we could merely plot two frequency 
curves on the same grid and study their outlines. The result of doing this 
for the two series of Table 8.6 is shown in Chart 8,6, The comparison is 
not particularly illuminating, although it is obvious that the most 
prevalent earnings are a little higher for key-punch operators than for 
bookkeeping-machine operators. If each frequency is expressed as a 
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percentage of the total of which it is a part, we obtain the percentage 
frequency distributions, which are also given in Table 8.6. Plotting the 
two percentage frequency distributions, as in Chart 8.7, enables us to 
make a graphic comparison of the two series, which is no longer ccm* 
plicated because of the different number of items. The relative impor- 
tance of all of the various classes may now readily be seen. 

The comparison of the two series of Table 8.6 was facilitated because 
the class intervals were the same. If two series, expressed in the same 
units but having diffeient class intervals, are to be compared graphically, 

TABLE 8.6 

Average Straight-Time Weekly Earnings of 940 Female Bookkeeping -Machine 
Operators, Class B,* and of 4B7 Key-Punch Operators in Finance, 
Insurance, and Real Estate Offices itt Philadelphia, October 

1952 


Weekly earnings 

1 Number 

1 Per cent of total 

Bookkeeping- 

machine 

operators 

Key- 
punch 
operators i 

Bookkeeping- 

machine 

operators 

Key- 

punch 

operators 

30 . 00 but less than S32 . 50 

37 

15 1 

3.9 

3.3 

32 . 50 but less than 

35.00 

78 

41 

$.3 

9.0 

35 . 00 but less than 

37.50 

179 

55 

19.0 

12.0 

37 . 50 but less than 

40.00 

191 

76 

20.3 

16.6 

40 . 00 but less than 

42.50 

184 

91 

19.6 

19.9 

42 . 50 but less than 

45.00 

85 

57 

9.0 

12.5 

45 . 00 but less than 

47.50 

83 

45 

8,8 

9.8 

47 50 but less than 

50.00 

65 

43 

6 9 

1 9.4 

50 00 but less than 

52.50 

32 

22 

3.4 

! 4.8 

52 . 50 but less than 

55.00 

5 

11 

0 5 1 

2.4 

55,00 but less than 

57.50 

1 

1 

0.1 I 

0.2 

Total.. . . 


940 

457 

100.0 j 

100.0 


* A Cla&s B BookkeepinK-Machine Op<?rator “keeps a record of one or more phases or s-ections of a 
set of records usually requiring some knowledge of basic bookkeeping. Fhasc.s or sectioiw include 
accounts payable, payroll, customers* accounts (not including simple ty^pe of billing described under 
'biller, machine’), cost distiibution, expense distribution, inventory control, etc. May flieck or assist 
in preparation of trial balances and prepare control sheets for the accounting department.” 

Data from IJ. S. Bureau of Labor Statistics, Wagm and Salaries m Philadelphia, Pennsylvaniaf 
October 1932, PreliMinary Release, Table A-1. 

we may plot frequency densities per unit (that is, per dollar, per pound, 
or whatever the unit may be). If the two series also differ appreciably 
in regard to the number of items involved, the areas under the two curves 
may be made the same by computing percentage frequencies and ex- 
pressing the percentage frequencies as frequency densities. 

Occasionally we wish the differences between the numbers of items in 
two series to be apparent, as in Charts 24.1'~24.4, and in such a situation 
we do not use percentage frequencies. Frequency densities would, how- 
ever, be used when needed, as in Charts '24 A, 24.3, and 24,4 A. 
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When two frequency distributions are expressed in terms of different 
mits (dollars, pounds, inches, and so on), a direct graphic comparison is 
not feasible, since there is no simple way in which the Z-scales may be 
jidjusted to each other. Certain computed values, to be discussed later, 
may be used to obtain effective numerical comparison. 


PER CENT 
OF WOMEN 



Chart 8.7. Percentage Distributions of Average Straight-Time Weekly 
Earnings of 940 Female Bookkeeping-Machine Operators, Class B, and of 457 
Key-Punch Operators, in Finance, Insurance, and Real Estate Offices in 
Philadelphia, October 1952. Data from Table 8.6. 

Cumiilatire frequency distributions and the ogive. The data of 
Table 8.3 show the usual (non-cumulative) form of the frequency dis- 
tribution and enable us to ascertain the number of cadet-midshipmen 
falling in each class. Sometimes, however, it may be useful to know how 
many or what proportion of students received less than certain stated 
grades, or to know how many or what proportion of students received 
specified grades or above. This information may be seen clearly in a 
cumulative table such as Table 8.7. In this table the frequencies of 
Table 8.3 have been accumulated upon a “less than” basis and also upon 
an “or more” basis. 

When cumulative frequency distributions are drawU; the frequencies 
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TABLE 8.7 

Cumulative Distributions of Grades of the 19S2 Graduating 
Class of the United States Merchant Marine Academy 


Grade 

Number of 

cadet-midshipmen j 
whose grades 1 

Per cent of 
cadet-midshipm en 
whose grades 

Fell below 
the upper 
limit of 
each class 

Equalled 
or exceeded 
the lower 
limit of 
each class 

Fell below 
the upper 
limit of 
each class 

Equalled 
or exceeded 
the lower 
limit of 
each class 

72.0-73.9 

7 

225 

3.1 

100.0 

74.0-75.9 

38 

218 

16.9 

96.9 

76.0-77.9 

80 

187 

36.6 

83.1 

78.0-79.9 

134 

145 

59.6 

64.4 

80.0-81.9 

167 

91 

74.2 

40.4 

82.0-83.9 

191 

58 

i 84.9 

25.8 

84.0-85.9 

213 

34 

i 94,7 

15.1 

86.0-87.9 

221 

12 

98.2 

5.3 

88.0-89.9 

225 

4 

100.0 

1.8 


NUMBER OF 
CADET -MIDSHIPMEN 



GRADE 

Cliart 8.8« Cumulative Biatribiitioiis of Grades of tine 1952 
GrmduMtmg Class of the United States Merchant Marine 
Academy* Data of Table 8.7. 
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are plotted opposite the appropriate class limits, resulting in curves such 
as those shown in Chart 8.8. Such curves are called ogives. 

Cumulative frequency tables and ogives are often used to present data 
of wages and of hours of work. With reference to wages, they enable us 
to ascertain how many (or what proportion) of a group receive less than a 
subsistence level, standard level, or comfort level. Similarly, we can 
ascertain the number or proportion receiving a subsistence level or more, 
a standard level or more, and a comfort level or more. It is also possible 
to ascertain what wage the lowest- (or highest-) paid 10, 25, 50, or other 
per qent of the workers are receiving. With respect to hours of work, we 
can see quickly the number or proportion working unusually long or short 
hours. 

If two cumulative frequency distributions are based upon nearly the 
same number of items, their ogives may be plotted and compared in 
absolute terms. If, however, the two series are based upon different 
totals, the comparison must be based upon the percentage frequencies, 
just as in the case of comparing two frequency distributions in non- 
cumulative form, which was previously discussed. 



Symbols Used in Chapter 9 


j3i: lower-case Greek beta, a measure of skewness. See Chapter 10, 

021 lower-case Greek beta, a measure of kurtosis. See Chapter 10. 
d: deviation of an X value from 

d': deviation, in terms of class intervals, of an X value from ]ld. 

All upper-case Greek delta, the difference between the frequency of the 
modal class and the frequency of the class graphically to the left of the 
modal class. 

A 2 : upper-case Greek delta, the difference between the frequency of the 
modal class and the frequency of the class graphically to the right of the 
modal class. 

/: a frequency. 

fhf 2 }fzj * • “ : the frequencies associated with Xi, X 2 , X 3 , • • • 

(?: the geometric mean. 

H : the harmonic mean. 
i: the class interval, 
h: the lower limit of a class. 
h: the upper limit of a class. 

Med: the median. 

Mo: the mode. 

n: as used in the compound interest formula/' the number of years (or 
other time units) from the beginning to the end of the period. 

N: the number of items in a sample. 

Po and Pni as used in the compound interest formula," respectively, the 
value at the beginning and at the end of the period. 

Qh Qh Qz' the quartiles. Qt — Med. 

S: upper-case Greek sigma, meaning ‘'take the sum of." 
r: as used in the "compound interest formula," the ratio of increase or 
decrease per year (or other time unit). 
s: the standard deviation of a sample. See Chapter 10 . 
x: the deviation of a value from X. 

X 2 j ajs, * • • ; deviations of Xi, X 2 , Xg, * • • from 
X: a value in a series; also, the mid-value of a class in a frequency dis- 
tribution. 

Xi, X 2 , Xg, ' • * : the values in a series; also, the mid-values of the 
classes of a frequency distribution, 
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Xd* a designated mean used as a first approximation to facilitate the com- 
putation of X of a frequency distribution. 

X : the arithmetic mean. In later chapters, we shall distinguish between 
the ^arithmetic mean of a sample, X, and the arithmetic mean of the 
population, X(p. 

eo j infinity. 



CHAPTER 9 


Measures of Central Tendency 


We have seen how to construct a frequency distribution and how to 
draw a frequency curve. From either the classified data or the ch^, 
it is obvious that there are certain values that are frequently present and 
others that occur less frequently. Most of the curves that we encounter 
are of the type that is very roughly ‘‘bell-shaped/^ as shown in Charts 
8.1, 8.2, and 8.8. For such series as these charts represent, it is obvious 
that the more characteristic values are in the central part of the- distribu- 
tions. We therefore use the term meamres of central tendency to identify 
the values which may be computed in an attempt to characterize this 
aspect of a frequency distribution. We shall discuss in this chapter the 
arithmetic mean, the median, the mode, and, briefly, the geometric mean 
and the harmonic mean. 

In the following chapter we shall consider measures of dispersion, which 
refer to the spread of a distribution; measures of skewness, which measure 
the direction and amount of asymmetry; and measures of kurtosis, which 
indicate the degree of “ peakedness of a series. 

THE ARITHMETIC MEAN 

The arithmetic mean from tingrouped data. The arithmetic 
mean is in such constant everyday use that nearly all of us are familiar 
with the concept. Sometimes we refer to the arithmetic mean merely as 
“the average'* or “ the mean,'* but we always use the appropriate adjective 
when we are speaking of the geometric mean, the harmonic mean, or some 
other less usual mean. 

The arithmetic mean of a series of items is obtained by adding the 
values of the items and dividing by the number of items. Suppose that, 
in a certain small city, carrots are selling for 8fi, lOfS, 11^, and 12^ a 
pound. The arithmetic mean of these four figures would be given by 

%i+lU + lU + l2i 41fS 
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If "we let Xi, Xij Xs, etc., indicate the various values; N, the number of ' 
items; and X, the arithmetic mean, we have 


1 - 


Xi + X 2 + Xa + 
N 


"b X,v 


Or, more briefly, using the summation symbol S, we may say 


X = 


SX 
N * 


The foregoing computation of the arithmetic mean involved no con- 
sideration of the fact that different quantities of carrots may have been 
sold at the various prices. When an arithmetic mean is computed in this 
fashion, it may be referred to as a simple arithmetic mean. It is not cor- 
rect to refer to this mean as an unweighted arithmetic mean, since each 
of the prices was weighted equally. Let us proceed to compute a properly 
weighted arithmetic mean, considering the fact that there were sold 

10,000 pounds of carrots at 8,000 pounds at 10^, 4,000 pounds at llji?, 
and 1,000 pounds at 12fz!. We now have 


F (10>000 X H) + (8,000 X 10^) + (4,000 X 11^) + (1,000 X 12^) 

23,000 


216,000^ 

23,000 


- Q.ZH, 


If we use the symbols /i, / 2 , /s, etc., to indicate the numbers or frequen- 
cies associated with each value being averaged, we have 

XL- ^^Xi+/2X2+/3X3+ ■ - S/X S/X 

/1+/2+/3+ • • • 2 / N ^ 

Ordinarily an arithmetic mean is considered to be a weighted arithmetic 
mean, as just described, unless otherwise specified. 

It should be noted that, although the arithmetic mean price of carrots 
is 9.39f5 per pound, no carrots were actually sold at this exact price per 
pound. The arithmetic mean must therefore be thought of as a com- 
puted value and not as a value which actually exists. 

Properties of the arithmetic mean. One important property of the 
arithmetic mean is that the algebraic sum of the deviations of the various 
values from the mean equals 2 ?ero. This is important, since it will enable 
us to develop a method for computing X which will save an appreciable 
amount of time when we are dealing with a frequency distribution. Let 
us consider a series of five values, 6, *8, 9, 11, 14, each one of -which occurs 
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but once. Then 


G-j-S-i-O-f-ll-f-M 

Ji. ~ 


48 

5 


- 9.6, 


Now let us compute the deviation of each value from the arithmetic mean, 
X’x = Xi — X, X 2 = X 2 -- Xj xz = Xz — Xj etc. We have 


X 

X 

6 

~3,6 

8 

-1.6 

9 

- .6 

li 

+ 1.4 

14 

+4.4 


It will be o!)serve(l that = 0; this is always true for any series of 
values.^ 

If we compute the deviations d of the five items from some designated 
value which is not the arithmetic mean, the sum of these deviations Sd 
will not equal zero. If the designated value is less than the arithmetic 
mean, there will be too many positive deviations and the sum of the 
deviations will be greater than zero. If the designated value is greater 
than the arithmetic mean, there will be too many negative deviations and 
the sum of the deviations will be a negative quantity. Since each of the 
five (N) items has been compared to a designated number which is not 
the true mean, the sum of the deviations will fail to equal zero by an 
amount which is exactly five {N) times the amount by which the desig- 
nated value deviates from the actual arithmetic mean. It is therefore 
possible to designate some value as an assumed mean Xdf to determine the 
deviations from this designated value, and, by adding (algebraically) the 


necessary correction 


Sd 

n' 


to obtain the arithmetic mean.^ 


The process is 


illustrated in Table 9.h where Xd is taken as 9, Here it is observed that 
2d = +3. If we divide this figure by N, we see that Xd was too small 
by 0.6. This is given by 


2d ^ ±3 
W o 


+ 0 . 6 . 


^ See Appendix S, section 9.1. If Sx = 0, it is obvious that « 0. is 
referred to as the first moment about the mean/' or merely as the '‘first moment/’ 
In the following chapter we shall have occasion to consider the second moment 

the third moment and the fourth moment 
^ See Appendix S, section 9.2, 
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^ Sd 3 

+ — ==9+- 
N 5 


which agrees exactly with X computed by adding the values and dividing 
by 5. 

TABLE 9.1 

Calculation of the Arithmetic Mean^ Y, 
hy Use of the Assumed Mean, Xd = 9 


X 

d 


6 

-3 

2d' = 

8 

9 

-1 

0 

z = 

11 

+2 


14 

“H6 

sac 


+3 



+3 

9 + 1 


N 


9.6. 


In the foregoing illustration, Xs was less than X. Suppose we choose 
Sd as 13. The computations are shown in Table 9.2. 

TABLE 9.2 

Calculation of the Arithmetic Mean^ X, 
by Use of the Assumed Mean, Xd = 13 

X d 

6-7 Sd « -17 


8 

9 

11 

14 


-5 

-4 

-2 

±1 

-17 


1^X,+ 


13 -f 


Zd 

N 

-17 


9.6. 


In this case, Xd was larger than X, as is indicated by 


Sd -17 


N 5 

-3.4. The result is, as before, Z = 13 — 3.4 = 9.6. 

A second property of the arithmetic mean, which is of importance in 
connection with later discussions, is that the sum of the squared deviations, 
Sa:^, is less when the deviations are taken around X than when they arc 
taken around any other value. This is demonstrated in Appendix S, 
Section 10.1. 

The arithmetic mean from grouped data: long method. Table 
9.3 shows the frequency distribution of t]^ grades of cadet-midshipmen, 
and it is desired to ascertain the value of X for the series. When dealing 
, with a frequency distribution, we do not ordinarily have the original data 
from which the frequency distribution was made. When we do have the 
unclassified data (as in Table 8.1), we can obtain the value of the arith- 
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m^tic mean most accurately by totaling the values and dividing by the 
number of items. When we have only the frequency distribution, we 
must compute the mean from the grouped data. Let us proceed to com- 
pute J? for the frequency distribution of Table 9.3, and then compare our 
result with the arithmetic mean computed from the unclassified data. 

In computing the arithmetic mean from a frequency distribution, we 
take the mid-value (sometimes called the class mark) of each class as 
representative of that class, multiply the various mid-values by their 
corresponding frequencies, total these products, and divide by the total 
number of items. Symbolically, if Xi, X 2 , X 3 * * • represent the mid- 
values and /i, / 2 , /s’** the frequencies, then 

X /i^1+/2^2+/sX3+ * * * ^ ^ ^ S/X 

/i + /2 + /s + * * * 2/ N 

The mid-value of a class is obtained by adding the upper and lower 
limits of the class and dividing by 2. For every frequency distribution, 
we must consider carefully what those limits are. For the distribution 
of Table 9.3, we might take the limits of the first class as 72.0 and 74.0, 

TABLE 9.3 

Computatian of the Arithmetic Mean for Grades of 
the 1952 Graduating Class of the United States 
Merchant Marine Academy by Use of the 
Expression 

^ - N 


j 

Grade 

Number of 
cadet- 
midshipmen 
/ 

Mid-vaiue 
of class 

X 

fx 

72.0-73.9 

7 

72.95 

510.65 

74.0-75 9 

31 

74.95 

2,323.45 

76.0-77.9 

42 

76.95 

3,231.90 

78.0-79.9 

54 

78.96 

4,263.30 

80.0-81.9 

33 

80.95 

2,671.35 

82.0-83.9 

24 

82.95 

1,990.80 

84.0-85.9 ‘ 

22 

84.95 

1,868.90 

86.0-87.9 

8 

86.95 

695.60 

88.0-89.9 

4 

88.95 

355.80 

Total 

225 


17,911.75 


jr 


XfX 

N 


17,911.75 

225 


79,61. 


giving a mid-value of 73.0. This would be correct if the grades had each 
been rounded to the last completed tenth, so that 72.0 included values 
ranging from exactly 72 to 72.099 • • • ,72:1 included values from exactly 
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72.1 to 72.199 * • ' , and so on, instead of having been rounded to the 
nearest tenth, as was actually done. If rounding had been to the last 
completed tenth, the class should have been designated “72 and under 
74'.^^ Since we are dealing with a continuous variable, the limits of such 
a class would be 72 and 74, and the mid-value 73. For the cadet-mid- 
shipmen^s grades, rounding was to the nearest tenth, and the lowest value 
which could fall in the class “72.0-73.9'' is 71.95, while the highest value 
is 73.9499 * • * . Thus, since the variable is continuous, the class limits 
are 71.95 and 73.95, and the mid-value is 72.95. The mid- values have 
been entered in Table 9.3 according to this procedure. 

When a class is designated (for example) “32.00-33.99," the mid-value 
is actually 32.995. Many statisticians would, however, state the mid- 
value as 33.00, since the relative discrepancy is small. In determining the 
mid-values for a frequency distribution, it is important to know how the 
readings were rounded. When no information concerning the rounding 
is given in connection with the frequency distribution, it is probably best 
to assume that figures were rounded to the nearest unit given. For 
example, if a one-inch class is written “12.0-12.9 inches," consider the 
limits as 11.95 and 12.95 inches; if a five-pound class is written “ 10-14 
pounds," consider the limits as 9.5. and 14.5 pounds. However, for dis- 
crete data, a $2 class “$10.00"$11.99" has the limits $10.00 and $11.99, 
and a $10 class “$70-$79" has the limits $70 and $79 if data were given 
only in whole dollars. A class should not be written “5 pounds but under 
10 pounds" unless we mean exactly what we say; namely, that items in 
this class do not fall below 5 pounds and do not equal 10 pounds. If the 
classes for the cadet-midshipmen's grades were written 72.0-74.0, 
74.0-76.0, and so on, and if cases falling on a class limit were divided 
between the two classes, as noted in Chapter 8, the mid- values would be 
73.0, 75.0, and so on. 

Considering the mid-values for the grades of cadet-midshipmen as 


discussed above, and using the expression X == 


2/Z 

N 


> we find that the 


arithmetic mean is 79.61, as shown below Table 9.3. From the unclassi- 
fied data of Table 8.1, let us compute the value of X to see how nearly the 


figure just obtained agrees with that value. If we total all of the indi- 


vidual grades and divide by 225, we have 


X - 


17,912.3 

225 


79.61. 


The two values for ^ are exactly the same. It is unusual for them^ to be 
identical, but we can generally count to a difference of not more than a 
few per cent at most. The value of the arithmetic mean computed from 
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a frequency distribution will generally be in close agreement with the 
arithmetic mean from the unclassified data if the variable is continuous 
and the distribution is s.ymmetricaL If (1) the distribution is skewed or, 
if (2) the variable is discrete (or if the data are .broken), or if both (1) aijfl 
(2) are true, the agreement will be less close. Likewise, close agreement 
cannot be expected if the data contain irregularities because an unduly 
small sample was used. 

Whenever lack of agreement between the two values for ][ is present, 
it is due to the inadequacy of the mid-value assumptions. It is almost 
always true that none of the mid-values is actually the true concentration 
point of its class. However, a glance at Chart 8.1, 8.2, or 8.3 will suggest 
that, for groups to the left of the group of maximum frequency, the mid- 

TABLE 9.4 


Comparison of the Class Mid-values with the Arithmetic Mean for 
each Class for the Grades of Cadet- Midshipmen 


Grade 

! Number of 
cadet- 
midshipmen 

Total of grades 
in each class 
(from Table 8.4) 

Arithmetic 
mean for 
each class 

Mid-value 
of each 
class 

72.0-73.9 

7 

511.0 

73.00 

72.95 

74.0-75.9 

31 

2,331.7 

75.22 

74.95 

76.0-77.9 

42 

3,236.0 

77.05 

76.95 

78.0-79.9 

54 

4,255,6 

78.81 

78.95 

80,0-81.9 

33 

2,666.0 

80.79 

§0.95 

82.0-83.9 

24 

1,991.5 

82.98 

82.95 

84.0-85.9 

22 

1,868.8 

84.95 

84.95 

86.0-87 9 

8 

696.0 

87.00 

86.95 

88.0-89.9 

4 

355.7 

88.92 

88,95 

Total 

225 

17,912.3 

79.61 

. . . 


value of a group is probably less than the mean of that group ; while for 
groups to the right of the group of maximum frequency, the mid-value of 
a group probably exceeds the mean of that group. Although all the mid- 
value assumptions are usually incorrect, there is a definite tendency for 
the errors to, offset each other, provided the distribution is approximately 
symmetrical For the data of cadet-midshipmen^s grades, we have the 
unclassified data from which the frequency distribution was made and we 
can compute the arithmetic mean for each class and compare the class 
means and class mid-values. This has been done in Table 9.4, where it 
may be seen that for the first 3 classes the mid-value of each class is less 
than the class mean. For the last 5 classes, 2 of the mid-values exceed 
their class means and 2 of the mid-values are less than their class means; 
in the case of one class, the mid-value and the class mean are the same. 

The arithmetic mean from grouped data: short methods* In 
Tables 9.1 and 9.2 it was shown that we couid assume a value for the 
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arithmetic mean and, making ime of the fact that 'Sx = 0, compute 'the 
necessary correction to obtain X. This method will save us appreciable 
time in computing the mean from a frequency distribution. The expres- 
sion for 1 is as before, except that the symbol / is introduced because of 
the frequencies in the various classes. Thus, 


Z = Xi H- 


n" 


The selected value for Xa may be the mid-value of any .class. In Table 
9.5 Zd has been taken as the mid- value of the fourth class, and the com- 

TABLE 9.5 

Computation of the Arithmetic Mean for Grades of 

the 1952 Graduating Class of the United States 

Merchant Marine Academy by Use of the 

Expression 

w ^ i 

JL -A d "t* ^ 


Grade 

Nmnber of 
cadet- 
midshipmen 
/ 

d 

fd 

72.0-73.9 

7 

- 6 

~ 42 

74.0-75.9 

31 

- 4 

-124 

76.0-77 9 

42 

- 2 

- 84 -250 

78.0-79.9 

54 

0 


80.0-81.9 

33 

+ 2 

-f 66 

82.0-83.9 

24 

4- 4 

4- 96 

84.0-85.9 

22 

+ 6 

4-132 

86.0-87.9 

8 

+ 8 

4- 64 

88.0-89.9 

4 

4-10 

4- 40 4-398 

Total 

225 


+148 


X 


Xi + 


2/d 

N 


78.95 + 


= 78.95 + 0.658, 
= 79.61. 


putations below the table show that Z = 79.61, the same as found by the 
longer method of Table 9.3. 

It will be observed that ail of the classes of Table 9.5 are of the same 
width. When this is true, we may further diorten our computation of X 
by taking our deviations from Xd in term of class intervals, d'. Our cor- 


rection 


S/d' 

N 


will then be in terms of class intervals and must be multiplied 


by the class interval i before being algebraically added to Xd. For the 
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arithmetic mean, then, 


The computation of X by this expression is shown in Table 9.6 afid 
yields the same result as given in Tables 9,3 and 9.5. This method should 
always be used when a frequency distribution is made up of equal class 
intervals. The greater the number of classes and the greater the number 
of items included in a frequency distribution, the more time is saved by 
this procedure. 

The arithmetic mean from grouped data having unequal class 
intervals. For a frequency distribution having unequal class intervals, 
the computation of X by the method shown in Table 9.6 would be 

TABLE 9.6 


Compututmn of the Arithmetie Mean for Grades of 
the 1952 Graduating Chtss of the United States 
Merchant Marine Academy by Use of the 

Expression 


1 1^4 + 


S/d' 
N ' 


Grade 

NuDiber of 
cadet- 
midshipmen 
/ 

d' 

1 

fd' 

72.0-73.9 

7 

-3 

-21 

74.0-75.9 

31 

-2 

-62 

76.0-77.9 

i 42 

-I 

1 

1 

Ot 

78.0-79,9 

1 54 

0 


80.0-81.9 

33 

+1 

+33 

82.0-83.9 

24 

4-2 

+48 

84.0-85,9 

22 

4-3 

+66 

86.0-87.9 1 

8 

+4 

+32 

88.0-89.9 1 

4 1 

+5 

I +20 +199 

Total i 

225 ’ 


1 ... +74 


« S/d' 74 

X « Xd -|- » = 78,95 -f 2^ 2, 

- 78.95 + 0.658, 

- 79.61. 


awkward because fractional values of d' w^ould be involved. The 
appropriate procedure is either that shown in Table 9.3 or that of 
Table 9.5. When classes vary in width, the distribution is invariably 
skewed, and we must remember that, as skewness increases, the errors in 
our mid- value assumptions offset each other less closely. Thus the mean 
computed from a frequency distribution having unequal class intervals 
may differ markedly from the mean computed from the unclassified data. 
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Furthermore, as will be discussed at the end of this chapter, the arithmetic 
mean of a decidedly skewed distribution is of limited usefulness. When 
a frequency distribution, such as that of Table 8.5, has a class of inde- 
terminate width at one end (or, occasionally, both ends), there is no 
indication of the value which should be chosen as representative of the 
class. If it is assumed that the indeterminate group has the same width 
as the preceding one, the mid-value will usually be too low. The use of 
such a mid-value may result in offsetting the upward bias of the pre- 
ceding mid-values, but we can never be sure how much offsetting takes 
place or that it may not even overbalance the bias. The reason a class 
is left indeterminate is usually that it contains a few items scattered over 
a wide range of values. 

It should be emphasized that the value of the arithmetic mean com- 
puted for a skewed distribution having unequal class intervals is only a 
reasonably good approximation. It becomes even less accurate when one 
or two indeterminate classes are present. The difficulty involved in the 
computation of the mean for such a distribution is completely resolved if 
a footnote is added to the table giving the total of the unclassified data. 
If this procedure is followed, a single division suffices to give the value of 
the arithmetic mean. 

Modified forms of the arithmetic mean. Instead of computing 
the arithmetic mean for all of a series of items, it may occasionally suffice 
to make an approximation by taking the average of the smallest and 
largest figures. The result of such a procedure will not differ greatly 
from the arithmetic mean if we are dealing with a continuous variable (or 
a discrete variable which does not show gaps) the distribution of which is 
symmetrical or nearly so. For example, meteorologists have found that 
it is not ordinarily necessary to take hourly temperatures throughout a 
day and average these 24 readings to arrive at the daily mean tempera- 
ture, It ordinarily suffices to average only the maximum and minimum 
temperatures. These two readings may be obtained from the high and 
low points shown on the graph traced by a recording thermometer, or 
they may be had from a thermometer which automatically records the 
maximum and the minimum temperatures. 

It will be recalled that the data of cadet-midshipmen’s grades is skewed 
to the right. Consequently we should expect the average of the lowest 
and highest grades to exceed the arithmetic mean computed from all of 
the grades. Let us determine the average of these two extreme values 
and see how far it departs from Z. The highest grade shown in Table 8.2 
is 89.6, while the lowest grade is 72.1. The average of these two grades is 
80.85. The value of X computed from the unclassified data was found 
to be 79.61, Although the ndiscrepancy resulting from averaging the 
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extremes is only 1.24, or 1.6 per cent, we should not use this method as an 
approximation of X unless the distribution is symmetrical or nearly so, 

A second modification of the arithmetic mean is one which will be 
referred to again in connection with the measurement of seasoiml mov/^ 
ments (Chapter 14). This modification consists essentially either of 
ignoring certain items on the basis that they are unusual extreme values, 
perhaps resulting from the introduction of a non-homogeneous or non- 
comparable factor into the situation, or of dropping one or more of the 
highest and lowest values in an array so that only the more typical values 
are averaged. 

Suppose that a runner has competed in the 100-yard dash in ten trank 
meets during a season, and that his times were as follows: 

10.2, 10.1, 10.0, 10.0, 10.1, 10.0, 9.9, 10.1, 11.4, 10.2 seconds 

Now an arithmetic mean of these ten figures is 10.2 seconds, although 
only three races were run this slow or slower. In the race represented by 
the ninth figure above, the runner was spiked and limped in to finish an 
extremely poor last. The figure 1 1.4 does not indicate Ms running ability 
and could quite logically be excluded in arriving at a mean time which 
represents this runner^s ability. If we average the other nine figures, 
we obtain 10.07 seconds as the arithmetic mean for this runner under 
normal running conditions. In like fashion, if one race had been run 
with a strong wind at the runner^s back, his time would be abnormally 
short for the 100 yards and that figure, too, might be omitted.® The pro- 
cedure Just described differs from the one followed in measuring seasonal 
movements in that only the particular values for which a specific reason 
could be definitely assigned have been eliminated. When measuring 
seasonal movements, we shall drop one, two, or more items at both ends of 
an array in order to average the items which seem to cluster around some 
central value. 

Averaging percentages. It was pointed out in Chapter 7 that a 
series of percentages based on different numbers should ordinarily be aver- 
aged by weighting each percentage in proportion to its base. There are 
conditions, however, under which we might want to ignore the different 
bases and to average several percentages using a different system of 
weights. For example, let us assume that a student has taken two com- 
prehensive examinations, each covering one-half of the subject matter of 
a course. Suppose that the first examination included 100 “ true-false 
questions, upon which he made 82 per cent, while the second included 150 

® A discussion of this type of modified mean when used in connection with time 
studies is given in F. E, Croxton and D. J. Cowden, jPmctical Bmtnem BtatuUcs^ 2nd 
id , Prentice-Hall, Inc., New York, 1948, pp. 171-176, 
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such questions, upon which he made 88 per cent. Since each percentage 
represents a level of accomplishment for one^half of the work of a term, a 
better description of the work of the student for the term would weight 
two ifercentages equally, resulting in an average of 

82 + 88 
— y— - 86 


rather than weight the percentages according to the number of questions 
asked, giving 

(100 X 82) + (150 X 88) ^ 


If the second examination had been based upon 10 essay questions, it 
is even more apparent that the weighting should not be determined by the 
number of questions included. 

Averaging averages. The general outlines of the problem of aver- 
aging averages are the same as those involved in averaging percentages. 
If we have several averages, each referring to a category, and wish to 
average these averages in order to arrive at a statement compatible with 
that referring to the total composed of these categories, it is necessary to 
weight each average according to the importance of its category. For 
example, if seven football linemen averaged 210 pounds in weight and 
four backfield players averaged 186 pounds, we might add the two means 
and divide by 2, obtaining 198 pounds. That, however, is not the correct 
arithmetic mean for the weights of the eleven players. We obtain the 
correct figure from 


(7 X 210) + (4 X 186) 
11 


2,214 
^ 11 


201 pounds. 


This is the figure w'e would get if we added the individual weights for the 
eleven players and divided by eleven. 

As in the case of percentages, there may be some instances in which the 
importance of each category is dependent upon some factor other than 
the number of items included in the category. Suppose that 1 2 tires have 
been run on a group of test trucks unloaded except for the driver, and 
have shown an average mileage of 13,618 miles. Suppose that 20 similar 
tires have been used on a similar group of test trucks each carrying the 
driver and 2,000 pounds of load, and have shown an average mileage of 
12,136 miles. The weighted average of mileage would be 

(12 X 13,618) + (20 X 12,136) 


32 


12,692 miles. 
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What we have done is to assign ff = 1.67 times as much weight to the 
second average as to the first. Actually, trucks sometimes travel 
unloaded, sometimes loaded, sometimes partly loaded, and sometim^ 
overloaded. If the trucks in our illustration travel i of th^r 
unloaded and | of their mileage loaded, we should arrive at our average by 


(1 X 13,61^) + (4 X 12,136) 
5 


12,432 miles. 


It is the importance of the various load conditions in the use of the trifck 
which should be considered in weighting rather than the number of tires 
tested. 


THE IMEDIAN 

The median from iingroiiped data. The median is usually 
defined as that value which divides a distribution so that an equal number 
of items is on either side of it. If we have five items, $5, $6, $7, $8, $10, 
it is apparent that the value of the median is $7, since there are two items 
below that value and two items above it. If we have six items, 2 inches, 
5 inches, 6 inches, 7 inches, 9 inches, 12 inches, it is clear that any value 
greater than 6 inches and less than 7 inches will satisfy our definition. As 
a matter of practice, when there are an even number of items, we usually 
take the value of the median as halfway between the two central items. 
In this instance the median would be 6.5 inches. 

If we are dealing with a series of values such as 12, 13, 14, 15, 15, 17, and 
18 pounds, there is no value ■which is so located that three items are 
smaller than it and three items are larger than it. We would, however, 
designate 15 pounds as the median. It must be obvious that the defini- 
tion first given does not hold for situations such as this. The definition 
is therefore recast thus: the median is that valve which divides a series so 
that one-half or more of the items are equal to or less than it omd one-half or 
more of the items are equal to or greater than it 

From what has already been said, it is obvious that the median cannot 
readily be located unless the data have been put into an array or, as we 
shall see shortly, into a frequency distribution. It will be recalled that 
no arranging is necessary for computing the mean, since the items of a 
series may be totaled no matter what their order. 

The value of the median of a series may or may not coincide with the 
value of an existing item. When there is an odd number of items in an 
array, the value of the median coincides with that of one of the items; 
when there is an even number of items in an array, it does not. 

An important property of the median, which will be referred to again, 
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Is that it is influenced by the position of the items in the array but mt 
by the size of the items. It has already been observed that the median 
vpf $5, $6, $7, $8, $10 is $7. The two larger items may have any values 
glcfiter than $7 and the two smaller items may have any values smaller 
than $7, yet the median remains $7. 

Before proceeding to a consideration of the computation of the median 
for grouped data, let us compute the value of the median for the grades 
of the 225 cadet-midshipmen arrayed in Table 8.2. We want to find the 
value which is so located that 112 items will be on either side of it. This 
is/ of course, the value of the 113th item,^ and counting from either end 
re/eals that the value of the median is 79.0. If we had an array of 200 
items, we should find the value which divides the distribution so that 100 
items fail below and 100 above it. This is obviously the mean of the 
100th and 101st items counted from either end of the array. 

The median from grouj^ed data. To determine the value of the 
median of a frequency distribution, we count half of the frequencies from 
either end of the distribution in order to ascertain the value on either side 
of which half of the frequencies fall. To determine the value of the 
median for the grades of the cadet-midshipmen (Table 9.6), we first com- 



112.5. We then proceed to ascertain the value of the median. 


There are 80 frequencies included in the first three classes of the distribu- 
tion. The estimated value of the median is therefore obtained by inter- 
polating 32.5 frequencies (112.5 — 80) into the fourth class, assuming 
that the frequencies in that class are evenly distributed within the class. 
The median, then, is given by the expression 


32 5 

Med = 77.95 + ^ 2 
54 


77.95 + 1.20 - 79.15. 


Exactly the same result is obtained if we begin our computations from the 
other end of the distribution. There are 91 frequencies included in the 
last five classes, and we proceed to interpolate 21.5 frequencies (112.5 — 
91) into the fourth class, from the upper limit toward the lower limit The 
result is 


^ For mgrouped data it may seem convenient to find the value of the median by 
N + l 

counting — ^ — items, beginning with the highest (or lowest) item in the array. This 

is not the same as saying that the median is the ^—^^th item. Although some 

persons hold this concept, it is not satisfactory. The concept of the middle item as 
the median is unsatisfactory when the array consists of an even number of items, and 
must be abandoned when the median is determined from grouped data. 
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21.5 

Med - 79.95 - ^2 = 79.95 - 0.80 := 79.15. 

54 

The value of the median is, of course, the same whether we'begin^/oar 
computations from one end or the other. 

The value of 79.15 just obtained for the median from the frequency 
distribution is in very close agreement with that of 79.0 found from the 
array. Unless the data contain gaps or irregularities, we can expect 
rather close agreement when dealing with a continuous variable, and like- 
wise for a discrete variable if the data are not broken. 

We have now computed the values of the arithmetic mean and *the 
median for the frequency distribution of cadet-midshipmen’s grades. 
The mean was 79.61. The median was 79.15. The mean exceeds the 

(< 

median because the distribution is skewed to the right. If a distribution 
is exactly symmetrical, the mean and the median are identical If a dis- 
tribution is skewed to the left, the mean will be less than the median. 
This point will be treated more fully at the end of this chapter and in the 
following chapter. In Chapter 10 we shall see that one way of measuring 
skewness involves consideration of the values of the mean and the median. 

The computation of the median from a frequency distribution of 
unequal class intervals does not differ from that just described. Neither 
does the presence of indeterminate groups at either or both ends com- 
plicate the procedure. 

If an ogive of a distribution is plotted, it is possible to obtain the value 
of the median graphically, as shown in Chart 9.1. The process is the 
graphic equivalent of the computations already made and consists of the 

N 

following steps: (1) Compute and locate this point on the vertical scale. 

(2) Draw a perpendicular to the F-axis at this point and extend the per- 
pendicular to intersect the ogive. (3) At the intersection, drop a per- 
pendicular to the X-axis, The intersection gives the value of the median. 
Prom Chart 9.1 it is seen that, for the grades of the cadet-midshipmen, 
the value of the median, located graphically, is 79.2, which is in close 
agreement with that computed arithmetically. 

The qiiartiles, quintiles, deciles, and percentiles. The median 
characterizes a series of values because of its midway position. There are 
several other measures of the frequency distribution which, taken indi- 
vidually, are not measures of central tendency but, as we shall see later, 
may be used to assist in measuring dispersion and skewness. They are, 
however, allied to the median in that they are based upon their position 
in a series. We shall therefore digress at this point to discuss the QuartileSf 
quintileSf deciles^ and percentiles. 
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There are three quartiles, Qi, Qi, and Qz, which divide the distribution 
into four equal parts. Qz is, of course, the median and is generally so 
--designated. To determine the value of Qi. the first or lower quartile, for 

N 225 

the'' data of cadet-midshipmen’s grades, we count — = — = 56.25 
frequencies from the lower limit of the first class. Thus for the value of 


NUMBER OF 
CADET -MIDSHIPMEN 



GRADE 


Chart 9.1. Graphic £4>cation of the Median for Grades of the 
1952 Graduating Class of the United States Merchant Marine 
Academy. Data of Table 9.6. 


Qi we have 


Qi — 75.95 H 2 


The same result may be obtained by counting — from the upper limit of 
the last class. 

3N 

The value of the third quartile Qz may be computed by counting — 
from the lower limit of the first class or, more expeditiously, by counting 
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i\r ' N 

from the upper limit of the last class. Since -t- = 56.25, and since 
4 4 

there are 34 frequencies in the last three classes, we have 


82.10. 


There are four quintiles, which divide the distribution into five equal 
parts; nine deciles, which divide the distribution into ten equal parts; and 
ninety-nine percentiles, which divide the distribution into 100 equal parts. 
The procedure for computing these values is similar to that for the median 
and the quartiies. For example, we shall compute the value of the 3rd 

ZN 675 

decile, which is also the 30th percentile. We count — = — = 67.5'^ 

from the lower limit of the first class and interpolate. Since there are 
38 frequencies in the first 2 groups, we have 

29 5 

75.95 + ” 2 = 77.35. 

42 


Unless a distribution is very extensive, there would be no purpose served 
in computing very many of the percentiles. Frequent use is made of 
only a few of them, such as the 99th, 98th, 95th, 90th, 85th, 80th, and so 
forth. 

The terms quartile^ quintile, decile, and percentile are sometimes used in 
a different sense, to refer to the part of the distribution in which an item 
falls. Thus, if a student is said to be in the upper quartile of his class, 
he is in the upper 25 per cent. If he is in the upper decile of his class, he 
is in the upper 10 per cent. It would undoubtedly lead to clarity of 
expression if we reserved quartiies, quintiles, deciles, and percentiles to 
mean the measures discussed at the opening of this section. To refer to 
the part of a distribution in which a student falls, we could say highest 
quarter” (above Qa), ^'second highest quarter” (between Qz and Qg), 
^Hhird highest quarter” (between Qi and Q 2 ), and ^Uowest quarter” 
(below Qi). Similarly, we could say ‘^fifths” in place of quintiles, 
"tenths” instead of deciles, and "hundredths” instead of percentiles. 


THE MODE 

The mode from. uBgrouped data. The mode of a distribution is the 
value at the point around which the items tend to be most heavily con- 
centrated. It may be regarded as the most typical of a series of values. 
For this very reason it is apparent that the occurrence of one or a few 
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extremely high (or low) values has no effect upon the mode.® If a series 
of data is unclassified, not having been either arrayed or put into a fre- 
"^^^uency distribution, the mode cannot be readily located. 

leaking first an extremely simple illustration: If seven men are receiving 
daily wages of $5, $6, $7, $7, $7, $8, $10, it is clear that the modal wage 
is $7 per day. If we have a series of values such as 

3, 5, 6, 7, 9, 10, 11, 

it is apparent that there is no mode. 

The mode from grouped data. If we examine the array of cadet- 
midshipmen’s grades shown in Table 8.2, we find that it would be very 
^difficult to determine the value around which the items tend to concen- 
trate. The mode may be located readily by referring to a frequency 
distribution such as Table 9.6. Here it is clear that the modal group is 
78.0-79.9; and if we take the mid-value as representative of the class, we 
should call 78.95 the mode. 

However, there is evidence here that the mid-value is not the best esti- 
mate of the mode. Since there are more frequencies in the class pre- 
ceding the modal class than there are in the class following the modal 
class, it is logical to expect that the actual concentration is toward the 
lower limit of the class. We shall make use of the frequencies in these 
two adjacent classes to infer the probable concentration point within the 
modal class. The expression is 


Mo — li + 


Ai 


Ai + Ag 


Xi, 


where h = the lower limit of the modal class; 

Ai » the difference between the frequency of the modal class and 
the frequency of the preceding class (sign neglected) ; 

A 2 “ the difference between the frequency of the modal class and 
the frequency of the following class (sign neglected) ; 
i the interval of the modal class. 


^ This is true in respect to the usual method of locating the mode which is described 
here. If the mode is located by the expression 


Mode 


T (^2 + 3) 

* 2(5/3* - 6/81 - 9)’ 


or by determining the X value just below the peak of a fitted curve, the extreme 
values do have some slight influence. The computation of t ^ 1 , and 0m is discussed 
in the following chapter. 



For the frequency distribution of 
grades of the cadet-midshipmen, 

Mo - 77.95 

I - 54-42 

(54 - 42 ) + (54 - 33 ) ’ 

= 77.95 + ^ 2 = 78 . 68 . 

33 

The interpolation which we have 
made may be illustrated graphically 
as shown in Chart 9.2. It should be 
realized that we are merely making 
an estimate of the value of the mode. 
Nevertheless, it is a useful estimate, 
and it should be remembered that 
the mode has two important proper- 
ties ; first, that it represents the most 
typical value of the distribution and 
should 'coincide with existing items; 
second, that the mode (as usually 
computed) is not affected by the 
presence of extremely large or small 
items. 

Graphically we may obtain the 
mode from a column diagram, as in 
Chart 9.2. We may make a very 
rough approximation of the mode by 
reading the value on the X-axis cor- 
responding to the highest point of the 
frequency curve or corresponding to 
the steepest portion of the ogive. 
The curves may be smoothed free- 
hand, since, ‘unless the series has been 
subjected to a smoothing process, we 
would obtain a value about the same 
as the mid-value of the modal group. 

Upon occasion, series are encoun- 
tered which have two modes and are 
referred to as M-modaL Such a series 
is pictured in Chart 9.3. Sometimes 
bimodality is the result of chance; 
sometimes it results because of the 
fact that two sets of non-hom.ogene- 

m 



Chart 9 . 2 . Diagrammatic Illus- 
tration of the Method of Interpolat- 
ing for the Value of the Mode. Ai 
exerts an upward influence, and ex- 
erts a downward influence, each in pro- 
portion to its nxagnitude, so that the 
mode divides the interval of the modal 
class into two parts proportional to Ai 
and A2. That is, 

Mo — k ^ ^ 

Iz — Mo Ak2 

Geometrically, the mode may be lo- 
cated by dropping a vertical line from 
the intersection of the two diagonals as 
shown on the diagram. 

Algebraically the expression 

i+i.' 

may be developed as follows: 

"We wish to locate the mode so that 
Mo — h ^ ^ 
k - Mo A2 

A2M0 — A2I1 =® AiZa — AiMq, 

At Mo "b A2M0 =» Ailt Hh A2Z1, 
Mo(Ai -j- As) « Aih 4 “ A^lh 
But la “ -f- f. 

Alii “b Aii 4- A2I1 

= — aT+a; — 

^ AiZi "b Atk Aii 

Ai 4" A2 Aj 4 “ Aft 

I , > 

^ At 4 " At' 
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ous data are present. In Chart 9.3 the two concentrations are attributa- 
ble to the fact that some drivers were on full- (or nearly full-) time work, 
while oth^ers were working only one or two days a week. 

CHARACTERISTICS OF THE MEAN, MEDIAN, AND MODE 

Before proceeding to a consideration of other measures of central tend- 
ency, we shall examine the characteristics of these three relatively simple 
and very important measures. 

NUMBER or 
DRIVERS 



0 20 40 60 80 100 

WAGES IN DOLLARS 

Chart Distribution of Wages Received in Half Month by 
Drivers in Bituminous Coal Mines, Illinois, 1933. Data from United 
States Bureau of Labor Statistics, Wages and Hours of Labor in Bituminous- 
Coal Mining: 19SS, Bulletin No. 601, p. 61. 

Familiarity of the concept. The arithmetic mean is the most 
widely used of all the measures of central tendency. As will be pointed 
out later, it is frequently used under conditions which cause it to be mis- 
leading. The median is less well known than the arithmetic mean, but 
it is based on a simpler concept. Also less well known than the arithmetic 
mean, the concept of the mode as the most usual or typical of a group of 
items is probably the simplest of the three. 

The concepts of the three measures may be illustrated by means of the 
three parts of Chart 9.4, The mean is at the point of balance, or center of 
gravity, such that S/X on one side of the mean equals S/X on the other 
side. The median divides the curve into two equal areas. The mode is 
the value below the peak of the curve. 
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•Algebraic treatment. The arithmetic mean may be treated algebra- 
ically: 

2X 

(a) Since X = it follows that, if any two of the three factors (the 

total, the arithmetic mean, the num- 
ber of items) are known, the third may 
be computed. Thus 


1 


N = 


N ^ 
SZ 
1 ■ 


SX = Xl; 


(b) Using appropriate weights, a 
series of arithmetic means may be 
averaged to yield the arithmetic mean 
of all the data on which those means 
were based. 

The median does not lend itself to 
the type of algebraic treatment dis- 
cussed for the arithmetic mean. Al- 
gebraic treatment of the mode, similar 
to that sketched for the mean, is not 
possible. 

Need for classifying data. The 
arithmetic mean may be computed 
from unclassified data, from arrayed 
data, from the frequency distribution, 
or (as noted above) merely from a 
knowledge of the total SX and the 
number of items N, When the arith- 
metic mean is computed from a fre- 
quency distribution, the value of X 
will very closely approximate the value 
of X for the unclassified data. The 
more nearly symmetrical the distribu- 
tion, the closer the agreement of these 
two values. 

In order that the value of the 
median may be computed, the data 
must be in an array (at least the central items must be arrayed) or in a fre- 
quency distribution. The median determihed from the frequency die- 


p 

\ 

X 

A, The values to the right of Yjaal* 
ance the values to the left of X. 

f 

\ 

MEDIAN 

B. One half of the area under the 
curve is on each side of the ordinate 
erected at the median. 

/ 

\ 


c. 


MODE 

The mode is directly beneath the 
peak of the curve. 

Chart 9.4. Location of the 
Arithmetic Mean, the Median, 
and the Mode in a Freqneney Bis- 
trihntion Skewed to the Bight, 
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tribution will agree approximately with that computed from the array If 
the distribution of items is regular within the class containing the median. 

The mode is most readily located from the frequency distribution, and 
only with some difficulty from an array. King® has pointed out that an 
array of the cities of the United States according to population of each 
would show no mode. However, if such data were put into classes, a 
modal tendency might appear. It should be borne in mind that the 
process of interpolating for the modal value within the modal group is at 
best only an approximation. More refined methods of locating the mode 
involve essentially the smoothing of the data by formula and the deter- 
mination of the X value of the maximum ordinate. 

Effect of unequal class intervals. When classes vary in width, the 
value of the arithmetic mean may be computed. Such a variation of 
class intervals is necessitated by the presence of marked skewness (almost 
invariably to the right, or positive) resulting in a value for X which may 
not be in close agreement with that based on the unclassified data. The 
value of X from such a positively skew^ed frequency distribution would be 
expected to exceed the value of X from the unclassified data. 

The median may ordinarily be determined rather satisfactorily from a 
frequency distribution having varying class intervals. The upper quar- 
tile or one or more of the upper quintiles or deciles might, however, fall in 
a wide class having few frequencies. The necessary interpolation would 
in such a case be unreliable. 

When the class intervals of a frequency distribution vary in width, the 
mode may be satisfactorily located if the modal group and those on either 
side of it are of the same width. Otherwise the determination is apt to 
be of limited accuracy. 

Effect of classses with open end. The presence of a ^^Less than 
. . . ” class at one end of a frequency distribution and/or an or 

more^’ class at the other end results in an inaccurate determination of 
X, since mid-values ordinarily cannot be satisfactorily determined for 
such classes. 

The presence of open-end classes has no effect upon the determination 
of the median. 

Indeterminate groups do not complicate the process of locating the 
modal value. Occasionally, as when working with an extremely skewed 
or a reverse J-shaped distribution, the mode is at or near the end of the 
distribution. Under such conditions there would be no reason for having 
an indeterminate group at that end of the distribution. Incidentally, in 


« Wlllford I. King, The Elments of Statistical Method, The Macmillan Company, New 
York, 1919 p. 126* 
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the case of such distributions, the mode is not a measure of central 
tendency. 

Effect of skewness. For a symmetrical distribution, tlie mean, 
median, and mode are identical. If the symmetrical distribution is 
altered by merely extending one tail so that the distribution is skewed, 
there is no necessary change in the value of the mode (as usually com- 
puted), but the median is changed in the direction of the skewness. Thus 
positive skewness (skewness to the right) increases the value of the 
median. The mean is increased even more, since it is affected not only 
by the fact that there is now an excess of frequencies on one side of the 
mode, but also by the amount by which the various excess frequencies 
deviate from the mode. Although the distribution of grades of the cadet- 
midshipmen is only slightly skewed, the effect of the presence of skewness 
is seen when we recall that the mode is 78.68, the median is 79.15, and the 
mean is 79.61. These values are shown on Chart 10.7. 

Effect of extreme values. When skewness is not general but is due to 
a few items deviating a great deal from the mode, the median will be only 
slightly affected. The arithmetic mean, however, is affected by the value 
of every item in the series, and the presence of one or a few extremely 
large (or extremely small) items in a series may result in a mean which is 
very misleading. As ordinarily computed, the mode is not at all influ- 
enced by the presence of a few unusually high (or low) extreme values. 

The foregoing is of such great importance that we shall give further 
attention to it. Suppose we have the following series of seven values, 

$12, $14, $15, $15, $16, $18, $19, 

the mean of which is $15.57, the median $15, and the mode $15. If an 
extreme value of $25 is added to these seven, the arithmetic mean 
becomes $16.75, the median $15.50, while the mode remains $15. Now if, 
instead of having added $25 as the eighth item, we add $200, the mean 
becomes $38.62, but the median is still $15.50 and the mode $15. The 
effect upon the median of any added value from $16 to oo is the same. 
The mode was not at all affected by the extreme value, although, if we had 
added a $16 item, it would have been affected. This illustrates a different 
point, also; namely, that the mode is not a useful measure unless it is 
based upon enough items to show a well-defined concentration. 

Because of the effect of extreme values upon the arithmetic mean, it is 
sometimes a misleading figure to use to describe a distribution. If we are 
considering the income of a group of people, and if most of them have 
moderate incomes but one or a few have extremely high (or low) incomes, 
the mean will reflect these extremes and to that extent will be atypical 
rather than typical. An alumni association made a study of graduates 
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who had been out of college 20 years. Among other questions asked was 
one concerning income during a specific year. More than 350 question- 
naires w^e sent out; only 133 replies were received. There is a large 
probability that these replies were selective and any figures derived there- 
from would be of doubtful value. The mean income of the 133 replying 
was $ 13 , 958 , but this high average was due to the fact that there were 
several very large incomes which were definitely extreme values. The 
median income was $7,500, while the mode was very close to $ 5 , 000 . In 
such a case as this, we should not use the mean alone to describe the dis- 
tribution. If only one figure is to be used, it is better to use the median 
or mode, depending upon which concept is of more importance. It 
would be much better, of course, to give all three values, and, if possible, 
a frequency distribution or a frequency curve. 

Sometimes in dealing with a series in which suspected heterogeneity is 
present, it may be advisable to use the median in lieu of the arithmetic 
mean. For example, measurements might have been taken of the weight 
of a number of goldfish, and the figures may reveal the presence of several 
unusually large specimens. It is suspected that, because of ignorance or 
carelessness, the enumerator included a few carp with the goldfish. The 
questionable values could be discarded. However, we are not sure that 
the heavy fish were carp, and perhaps their measurements should not be 
discarded. The use of the median allows the extreme values to be repre- 
sented by their position in the series rather than by their size. 

Sometimes we have a series in which there are present extremes of which 
we know the number but not the individual values. In such a situation 
we can determine the median or the mode, but not the mean. 

When we have a series of values extending over a great range, any con- 
cept of a measure of central tendency is dubious. Suppose we have the 
values 4, 6, 2,000, and 2,100. It is obvious that a mean or a median could 
be computed, but that neither would have any practical meaning. 

Effect of irregularity of data. When data are broken or irregular, 
the value of the mean computed from a frequency distribution may be 
decidedly different from the value based on the unorganized data. 

The same is true in the case of the median if gaps occur among the items 
falling in the class containing the median. When gaps occur in the 
vicinity of the median, the median is not a particularly good concept to 
use, as its value would be erratic if one or two items were added to or 
subtracted from the series. 

If a mode is clearly defined, there are not likely to be gaps near that 
value. When gaps are present near the mode, it is quite likely that there 
are too few items in the series for the mode to be either clearly defined or 
meanin^ul 
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• Reliability when based on samples. In Chapter 24 we shaM discuss 
the variation which may be expected in values of the arithmetic mean 
when based on repeated random samples. This volume will nom treat 
of the sampling variation of medians or modes. However, for samirres 
of the same size from a normal population, the median is subject to 
greater sampling variation than is the arithmetic mean, and the mode is 
more variable than the median. 

Mathematical properties. The arithmetic mean has two important 
properties: first, == 0; and second, Zx^ == a minimum. Because of 
this latter property, the mean is the usual basis of reference for measufes 
Off dispersion. The mean is an important function in many processes 
which will follow in later sections of this book. Among other uses, it is 
essential for fitting the normal curve to observed data. 

The sum of the deviations from the median (signs neglected) is a mini- 
mum. For this reason, certain measures of dispersion are sometimes 
based upon the median. 

Selection of appropriate measure. Using the foregoing measures 
as descriptive devices, the statistician may be faced with the problem of 
deciding which one to use to characterize a given set of data. In general, 
the measure of central tendency that he should use depends upon (1) the 
nature of the distribution of the data and (2) the concept of central tend- 
ency which is desired for a particular purpose. 

If the distribution is symmetrical, or approximately so, the three meas- 
ures may be used almost interchangeably. If a series is skewed, we must 
bear in mind that the arithmetic mean is frequently not a typical value, 
and that it may be better to use the mode (w^hich is typical) or the 
median. When there are extreme deviations or when there is suspected 
heterogeneity, we may use the median in place of the mean, or recourse 
may be had to a modified mean. 

If X is computed, use may be made of that value to obtain a total. 
Thus, if adults average 150 pounds in weight, it is safe to load about 20 
people in an elevator rated to carry 3,000 pounds. (The figure of 150 
pounds is somewhat high for the average weights of adults, but it is the 
figure frequently used to compute elevator capacity. It is obvious that 
the 20 people referred to should not all be heavy persons.) If subsequent 
computations are to be made involving a measure, the mean may be 
required. If a curve is to be fitted to a frequency distribution, the mean 
will probably be used. If one series of data is eventually to be compared 
with another in respect to dispersion, the mean may be needed. TMs, 
however, does not mean that the median or the mode should not be used 
for describing either or both of the series. 

The relative standing of a person in a class may be indicated by stating 
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whether his grade is better than the grades of half of the members. This 
Bating involves the use of the median. Other statements referring to 
various groportions of the students may be made by using quartiles, 
ol^':ntiles, deciles, or percentiles. 

' ^If we are interested in knowing the typical annual expenditure of motor- 
ists for gasoline, we should make use of the mode. 

Since the three measures embody different concepts, it may sometimes 
be advisable to use two or possibly all three. The use of the mean and 
the mode, or the mean and the median, gives us an idea of the amount of 
skewness present, as will be shown in the next chapter. 

Sometimes it is necessary to make a quick estimate of the central tend- 
ency of a series. Under such conditions, the mode may be promptly esti- 
mated from a frequency distribution, and the median may be quickly 
approximated from either an array or a frequency distribution. Of 
course, if the total and the number of items are given, the arithmetic 
mean may be computed in a few seconds. 

MINOR MEANS 

The arithmetic mean, median, and mode are frequently thought of as 
the more important measures of central tendency, because of their wide 
usefulness, simplicity, and general applicability. Under certain condi- 
tions other measures of central tendency may be useful, and we shall 
therefore consider the geometric mean and the harmonic mean. As 
pointed out earlier, the term ‘‘mean’^ is frequently used to designate the 
arithmetic mean; consequently, when referring to any other mean such 
as the geometric mean or the harmonic mean, we should always refer to 
the measure by its complete designation. 

The geometric mean. The geometric mean is defined as *Hhe iVth 
root of the product of the items/’ Thus, for the four items 5, 8, 10, 12, 
the geometric mean is 

G = Vs X 8 X 10 X 12 = V4800 = 8.3. 

It is xnteresting to note that the arithmetic mean of these four items is 
8.75. For any series of positive values (not all the same), the geometric 
mean is smaller than the arithmetic mean.'^ If one value of a series equals 
zero, the geometric mean equals zero and is therefore inappropriate. If 
one or more values are negative, the geometric mean can sometimes be 
computed but may be meaningless. These are important drawbacks to 
its use. 


* For a demonstration, see Appendix S, section 9.3. 
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• Symbolically, the geometric mean is , 

The computation is usually carried out by means of logarithms, thus: 


logG - 


log Xi 4* log -^2 4" log X3 + * * • + log Xn 
N 



The logarithm of the geometric mean is thus the arithmetic mean of the 
logarithms of the values. 

When frequencies are present, each logarithm must be multiplied by the 
corresponding frequency. Thus 


, ^ /i log Xt+U log X^+h logXa + • • • S/logX 

loBff - - 


For a frequency distribution, the geometric mean is usually computed by: 
(1) ascertaining the logarithm of the mid-value of each class, (2) multi- 
plying each logarithmic mid-value by its proper frequency, (3) summing 
these products, (4) dividing by the number of items, and (5) taking the 
anti-logarithm of the result. If a series is symmetrical in a logarithmic 
sense (see Chapter 23) and the items are evenly distributed within the 
classes geometrically instead of arithmetically, it is preferable to use the 
mid- values of the logarithms of the class limits rather than the logarithms 
of the mid-values of the classes. If the raw data are available, it is, of 
course, also advisable to re-form the frequency distribution in order to 
make the class intervals geometrically equal, if that had not already been 
done. 

It will be recalled that the arithmetic mean is the sum of the values 
divided by the number, while the geometric mean is the Nth root of the 
product of the values. As noted before, N times X gives SX. For the 
geometric mean, = Xi • X2 * X3 • etc.; that is, the geometric mean 
raised to the Nth power equals the product of the values. This leads to 
the rather interesting point that any series of numbers having the same 
N and the same SX have the same arithmetic mean (for example, 1 and 
11, 2 and 10, 4 and 8, 5 and 7,-2 and 14 all have an arithmetic mean of 
6), and that any series of numbers having the same N and the same 
product have the same geometric mean (for example, 1 and 36, 2 and 18, 
4 and 9 all have the geometric mean of 6), 

Another property of the geometric mean is that the product of the 
ratios of the values on one side of the geometric mean to the geometric 
mean is equal to the product of the ratios of the geometric mean to the 
values on the other side of the geometric mean. To illustra te, let u s take 
the values 4, 5, 20, 25, the geometric mean of which is V lOOOO == 10. 
The ratios of the values 4 and 5 to the geometric mean are and 1^, 
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while the ratios of the geometric mean to the values 20 and 25 are if and 
if. Thus we have 

± ± W 

10 ‘ 10 ““ 20 ’ 25 ' 

1 _ 1 
5 5* 

Similarly, we may reverse the ratios to write 

W 10 _ ^ ^ 

4*5 “ 10 lO' 

5-5. 


The following paragraphs discuss certain instances in which the geo- 
metric mean is useful. 

(1) The geometric mean may be used for averaging ratios. Consider 
the following data: 


Community 


Native-born 

inhabitants 


Foreign-horn 

inhabitants 

4.000 

3.000 


Ratio of 
foreign-horn to 
native-born 
(per cent) 
50 
200 


Raiio of 
native-born to 
foreign-horn 
(per cent) 
200 
50 


A 8,000 

B 1,500 


The arithmetic mean of the two ratios of foreign-born to native-born 
population is 125 per cent. Likewise, the arithmetic mean of the two 
ratios of native-born to foreign-born population is 125 'per cent! These 
two averages are inconsistent with each other. This incongruous result 
does not occur if we use the geo metric mean , for the geometric mean of 
each of the two pairs of ratios is V0.50 * 2.00 — 1.0, or 100 per cent. We 
could, of course, total or average the foreign-born inhabitants for the two 
communities, and total or average the native-born inhabitants, thus 
obtaining two ratios which are consistent. There are 7,000 foreign-born 
and 9,500 native-born inhabitants, or an average of 3,500 foreign-born 
and 4,750 native-born inhabitants. The ratio of foreign-born to native- 
born is 


7,000 3,500 

9,500 4,750 


= 73.7 per cent, 


and the ratio of native-born to foreign-bom is 


9,500 4,750 

7,000 3^500 


135.7 per cent. 
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The product of these two ratios is L This arithmetic method, however, 
does not assign equal weight to the two ratios. Observe that the arith- 
metic method involves the ratio of the arithmetic means {or totals), 
whereas the geometric procedure involves the geometric mean oi the 
ratios. Yfe have here two different concepts. Which one to use in a 
given situation depends upon the purpose. If we wish to establish a 
typical ratio for a number of communities and wish that ratio to be inde- 
pendent of the number of native-born or foreign-born persons present in 
the various places (that is, we wish to assign equal %veight to each ratio), 
we may use the geometric mean of the ratios. If we wish to allow Jhe 
populations to exert an influence, we may determine the ratio of the totals 
or arithmetic means. The question is not whether to use an arithmetic 
or a. geometric mean of the ratios, but whether to use a ratio based on 
arithmetic means (or totals) or a geometric mean of ratios. 

If the two ratios of foreign-born to native-born are averaged arithmetic- 
ally but weighted according to the native-born populations, the result is 
73.7 per cent. If the two ratios of native-born to foreign-born are aver- 
aged arithmetically but weighted according to the foreign-born popula- 
tion, we obtain 135.7 per cent. These figures, of course, agree with those 
obtained by taking the ratios of the totals. 

The geometric mean may be used when we wish to assign equal weight 
to equal ratios of change. Suppose (a) that two commodities are selling 
at $2 and $10 per unit; (b) that at a later date the first commodity doubles 
in price while the second one is halved in price, and thus they sell for $4 
and $5, respectively; and (c) that at a still later date the original price of 
the first commodity is halved and becomes $1, while that of the second 
commodity is doubled and becomes $20. The arithmetic mean under 
these three situations yields: (a) $6; (b) $4.50; and (c) $10.50. The geo- 
metric mean gives: (a) $4.47; (b) $4.47; and (c) $4.47. The assumption 
used to justify the geometric mean is illustrated by saying that a doubling 
in price offsets a halving in price, a quadrupling in price offsets a price of 
one-fourth the original figure, and similarly for any other two ratios whose 
product is 1. This characteristic will be referred to again concerning a 
possible use of the geometric mean in connection with price index numbers. 

(2) Sometimes a frequency distribution is encountered which is mark- 
edly skewed to the right. If, instead of plotting the mid-values of the 
classes, we use the logarithms of the mid-values (or better, plot the loga- 
rithmic mid- values, the geometric mean of each pair of Emits, on a loga- 
rithmic X-scale) and a symmetrical distribution results, a geometric 
analysis may be proper. This is discussed more fully in Chapter 23. 

(3) Probably the most frequently used application of the geometric 
principle has to do with the determination of average per cent of change. 
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If a city had a population of 100,000 in a given year and 120,000 ten years 
later, what was the average annual per cent of change? The change was 
20 per cent over the entire period. If we take one-tenth of that figure, or 
2 per cent, as the annual per cent of increase and compute a 2 per cent 
increase each year over the preceding year, the second population figure 
turns out to be 121,900! Obviously the correct figure is slightly smaller 
than 2 per cent, since we are actually compounding. We may compute 
the average annual per cent of change by using 

P, = P.(l + r)% 

w&ere Po = population at beginning of period ; 

Pn = population at end of period; 
r “ relative increase (or decrease) per year, expressed as a 
decimal; ’ 

n = number of years. 

For the data above, 

120,000 = 100 , 000(1 + r )^\ 


Solving this by the use of logarithms gives 


6.079181 
log (1 4- r) 


1 +r 
r 


5.000000 + 10 log (1 + r), 
0.079181 
10 

0,0079181. 

1.0184, 

1.84 per cent. 


The expression P„ — Po(l + r)" is sometimes termed the compound 
interest formula because of its usefulness in various problems involving 
compound interest. We have used it above to determine average annual 
per cent of growth.^ Knowing values of any three of the four symbols 
shown, we can solve for the fourth. Thus we may determine: 


(a) Average annual per cent of change r, 

(b) Population a given number of years later Pn^ assuming a constant 

relative change. 

(c) Number of years n until a given population will be attained, again 

assuming a constant relative change. 


* la the above discussion we found the average per cent of growth between two 
selected points. Sometimes we wish to find the average per cent of growth which 
best describes a number of values for different years. Such an average is not depend- 
ent upon only the first and last values of a series and is therefore more likely to be 
a representative figure. A method of fiitting a curve to obtain such an average is 
given in Chapter 18 ^ 
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(d) Population a given number of years earlier, Poj if the per cent of 
change was constant. 

It should be noted that the assumption of a constant relative change for 
population is not valid over extended periods for any except possibly 
^^new^’ countries. 

The harmonic mean. The harmonic mean H is the reciprocal of the 
arithmetic mean of the reciprocals of the values. The expression is 


H - 


1 


Trr * xr * 
1 A. 2 --^3 


+ 


Xa- 


N 



For purposes of computation, it is more convenient to use the form 


H 


or 


J. 

H 

The harmonic 


N N 


XT * \r ”» XT ^ V" 

A1A2A3 Ajv A 


_ + 4 . . . . 4 . 2 — 

X * xr • xr • ' xr xr 

1A2A3 A.v A 

= _ 

mean of the two values 3 and 12 is 

l + l 

1 3 12 

2 

24^ 

H - 4.8. 


For these same values, the arithmetic mean is 7.5, while^the geometric 
mean is V3X12 = 6. For any series of values (not all the same or not 
including zero as one value), the harmonic mean is smaller than either the 
geometric or the arithmetic mean.® 

The harmonic mean is so rarely computed for a frequency distribution 
that we shall merely note the procedure, which consists of multiplying 
the reciprocal of each mid-value (or mid-value of the reciprocals of the 


s See Appendix S, section 9.4. 
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class limits) by its frequency, adding these products, dividing by N, and 
taking the reciprocal of the result. 

While the harmonic mean is not a measure of great importance, it is 
often confusing and hence we shall give a somewhat extended explanation 
and indicate several possible applications. 

Application (2). Although oranges are not usually priced in this 
fashion, let us suppose that two grades of oranges are selling at 10 for $1 
and 20 for $1. The arithmetic mean may be computed as 


1 = 


10 + 20 
2 


= 15. 


That is, 15 for $1 , or $0,067 per orange. This is the price we must pay per 
orange if we spend equal amounts of money for each grade. Paying $0,067 
for each of 30 oranges, we shall spend $2.00 for the lot. 

The harmonic mean gives a different result; 


H = 


1 + 1 

10 20 


20 


3 


13i. 


That is, 13| for $1, or $0,075 per orange. This is the price we must pay 
per orange if equal numbers of oranges are bought ai each price. Thus, if 
we buy 15 oranges at 10 for $1 and 15 oranges at 20 for $1, we shall spend 
$2.25 for all 30. Similarly, if we buy 30 oranges at $0,075 each, we shall 
spend $2.25 for the lot. 

The harmonic mean will give the same results as the arithmetic mean 
if we weight by the quantities bought at each price. Thus 



15 oranges per $1, or $0,067 per orange. 


assuming equal amounts of money spent for each grade. 

If prices are quoted in the usual way, as so much per dozen, these 
oranges are selling at $1.20 per dozen and $0.60 per dozen. The simple 
arithmetic mean is: 


J _ $1-20 + $0.60 


= $0.90 per dozen, or $0,075 per orange. 


It is the same m the first harmonic mean, since we are assuming in our 
computation that equal quantities are to be bought at each price. (Iden- 
tical results are obtained if the quotations are per orange instead of per 
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dozen oranges.) On the other hand, if we consider that 10 oranges may 
be bought at $1.20 per dozen and 20 oranges may be bought at $0.60 per 
dozen, we have 


1 = 


($1.20 X 10) + ($0.60 X 20) 
30 


== $0.80 per dozen, 


or $0,067 per orange. 


This result is the same as obtained in our first and third calculations, since 
we have assumed that equal amounts of money are to be spent for eaVh 
grade of orange. 

In the above illustrations the harmonic mean has furnished no informa- 
tion not already available by use of the arithmetic mean. The harmonic 
mean may be useful, however, when data are customarily or conveniently 
given in terms of problems solved per minute, miles covered per hour, 
units purchased per dollar, and so forth. 

The arithmetic mean and the harmonic mean give consistent results if 
proper consideration is given to (a) how the data are quoted and (b) what 
weights are to be used. Taking prices as an illustration, the table below 
sets forth the relationships. Expressions 1, 2, 3, 4 give results consistent 
with each other. Similarly, expressions I, II, III, IV give consistent 
results. 



If the assumption is; 

If prices are quoted in 
terms of: 

Equal amounts of money 
spent for each grade or com« 
modity 

Equal number of units of 
each grade or commodity 
bought at each price 

Price per unit 

1. Y, weighted by quanti« 
ties for equal amounts 
of money (in this case, 
units per dollar) 

1 I, X, weighted by number 
of units (or equally) 

i 


2. Hf weighted by dollars 
(or equally') 

II* F, weighted by dollars 
for equal numbers of 
units (or price per 
unit) 

Units per dollar : 

3. Y, weighted by dollars 
(or equally) 

Hi. X, weighted by dollars 
for equal mimliers of 
units (or price per 
unit) 


4. Hy weighted by quanti- 
ties for equal amounts 
of money (in this case, 
units per dollar) i 

IV. F, weighted by number 
of units (or equally) 


Consider commodity A m selling at 4 units for $1, or fO.25 each, and 
commodity B m selling at 10 units for $1* or $0.10 each. 
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If equal amounts of money are to be spent.for each commodity: 

, ^ (0.25 X 4) + (0.10 X 10) 2.00 

1. X = . - = — 

= $0.1429 per unit, or 7 for $] 


2 . H 


3. I 

4. H 



2 1.00 
7 ~ 7 
0.50 


(4X1) + (10 X 1) 
2 


14 



= $0.1429 per unit, or 7 for $1. 
14 

= — = 7 for $1, or $0.1429 per unit. 

Z 

14 

— = 7 for $1, or $0.1429 per unit. 

z 


If equal numbers of units of each commodity are to be bought at each 
price; 

j 2 = (0-25 X 1) + (0.10 X 1) ^ 0^ 

2 2 

= $0,175 per unit, or 5.71 for $1. 


II. H = 


0.35 


0-25 (—] + 0.10 (;^) 

\.0.25/ \0.10/ 


_ 

2 

= $0,175 per unit, or 5.71 for $1. 


.5^ (4 X 0.25) + (10 X 0.10) 2.00 

J.XJ.* JL — ^ ™ ' “T-TZ 

0.35 0.35 


IV. H 




_2 ^ 80 
U ~ 14 
40 


5.71 for $1, or $0,175 per unit. 
5.71 for $1, or $0,175 per unit. 


From what has just been said it may be observed that (for either 
assumption), when averaging fractions (ratios) by the arithmetic or har- 
monic method, we use the arithmetie mean if weights are in the same 
terms as the denominator, the harmonic mean if weights are in the same 
terms as the numerator. Of course, if weights are in the same terms as 
the niunerator, they may be converted into terms of the denominator and 
the arithmetic mean employed.* 
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‘Suppose that a transaction consists of 40 handkerchiefs sold at 10 for 
II and 60 handkerchiefs sold at 20 for $1. Now we are not interested in 
either of the assumptions mentioned above. What we desire is, the mean 
price when 40 handkerchiefs sell at 10 for $1 and 60 sell at 20 for $1. 
Using the quotations as given (that is, in terms of number of units per 
dollar), we may use the harmonic mean with quantity weights. Thus 



14f per $1, or $0.07 each. 


Still using the quotations in terms of units per dollar, we may obtain the 
same result by employing the arithmetic mean, if our weights are amounts 
of money spent for each grade. Thus 


I = 


(10 X 4) + (20 X 3) 


100 

~ 14| per $1, or $0.07 each. 


If we shift our quotations to price per unit, we have 40 handkerchiefs sold 
at $0.10 each and 60 sold at $0.05 each. Now, using the harmonic mean, 
we weight by amounts of money spent for each grade. Thus 



7 

10 

0.10 


= $0.07 each, or i4| per $L 


Finally, using the arithmetic mean of prices per unit and weighting by 
quantities sold, w^e have 


1 = 


(0.10 X 40) + (0.05 X 60) 
100 


7 

100 


$0.07 each, or 14f per $1« 


A f plication {2 ) . Occasionally a frequency distribution may be encoun- 
tered which is so skew^ed to the right that, when plotted in terms of the 
reciprocals of the class mid-values, it assumes an approximately normal 
form. In such instances harmonic treatment may be indicated. Such 
cases are rather unusual, however, and will not be treated in this book. 

Application (S). An interesting and apparently valid application of 
the harmonic mean is given in an article by Holbrook Working.^® In his 
study of the factors influencing the price of potatoes, Working uses the 
harmonic mean, because, as he points out, a low^ price during part of a sea- 


Holbrook Working, Factors Determining the Price of Potatoes in SL Paul and 
Minneapolis, Technical Bulletin 10, University of Minnesota Agrieultiira! Experiment * 
Station, pp. 9 and 10. 
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son will be compensated only by a disproportionally high price during 
the remainder of the season. To illustrate, we have selected the monthly 
prices for one crop year and have shown them in Chart 9.5, When the 
reciprocals or the logarithms are plotted, the curve is straighter than 
when the arithmetic values are plotted, the reciprocals giving perhaps the 
most nearly straight line. This indicates that the harmonic mean is not 


PftiCE IN CENTS 





Chart 9*5. Price of Potatoes per 
Bushel in Minneapolis and St* 
Paul, September 19i9-^May 1920s 
A. Price, B. Logarithm of Price, 
C, Reciprocal of Price. Data from 
Holbrook Working, ihid,f p. 40. 

rise of 50 per cent, and a fall of 
cent. Thus 


inappropriate as a measure of the 
average price of potatoes during a 
season. 

It is sometimes argued that the 
geometric mean should be used for 
series of data having a definite lower 
limit and an indefinite upper limit. 
One type of such data is price rela- 
tives, which, having a base of 100, may 
fail to 0 but rise to oo . The question 
is not so much one of the existence of 
such limits as it is one of what values 
may actually occur and how the limits 
are approached — arithmetically, geo- 
metrically, or reciprocally— whether, 
if we are dealing with a frequency 
distribution, the series is approxi- 
mately symmetrical in terms of X, 
skewed but approximately symmetri- 
cal in terms of log X, or skewed but 

approximately normal in terms of “ 

Jx 

In an arithmetic sense, a price drop 
of 33.3 per cent is offset by a price rise 
of 33.3 per cent (of the original base), 
a decline of 50 per cent is offset by a 
90 per cent is offset by a rise of 90 per 


66.7 + 133.3 

2 

50 + 150 
2 

10 + 190 

2 


= 100 , 
= 100 , 
= 100 . 
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In a geometric sense, a price drop of 33.3 per cent is offset by a rise of 
50 per cent (of the original base), a fall of 50 per cent is offset by a rise of 
100 per cent, and a drop of 90 per cent is offset by a rise of 900,per cent. 
Thus 

Vm.7 X 150 = 100, 

VSO X 200 = 100, 

VlO X 1000 = 100. 


In a reciprocal sense, a price drop of 33.3 per cent is offset by a rise cf 
100 per cent (of the original base), a fall of 50 per cent is offset by h. 
rise to CO , and a fall of more than 50 per cent cannot be offset by any rise 
however great. Thus 


2 

_L + JL 

66.7 200 

2 

50 « 


100 , 

100 . 


There are a number of other measures of central tendency which are 
of mathematical and theoretical rather than of practical interest. One of 
these is the quadratic mean: 


N ‘ 


This is the square root of the arithmetic mean of the squares of the values. 
Unless all the values are the same, the quadratic mean exceeds the arith- 
metic mean. The quadratic mean is mentioned here because the concept 
is important. Although we do not use the term quadratic or “mean/^ 
we shall shortly compute the quadratic mean of the deviations from the 
arithmetic mean. It will not be a measure of central tendency, but a 
measure of dispersion; we shall call it the standard deviation, or s, and 
its expression is 




Symbols Used in Chapter 10 


AD: the average (or mean) deviation. 

as: lower-case Greek alpha, a measure of skewness using the third powers 
" of the X values. For oii and a 2 , see footnote 10. 

0 ^ 4 : lower-case Greek alpha, a measure of kurtosis using the fourth powers 
of the X values. 

/3i: lower-case Greek beta, a measure of skewness using the third powers 
of the X values. 

iSa: lower-case Greek beta, a measure of kurtosis using the fourth powers 
of the X values. 

d: deviation of an X value from Xd- 

d': deviation, in terms of class intervals, of an X value from Xd. 
f: a frequency. 

k^: a measure of uniformity, the reciprocal of 
i: the class interval. 

M : used with s to indicate a specified multiple of 
Med: the median. 

Mo: the mode. 

Ml, Ms, M 3 , lower-case Greek mu; respectively, the first, second, third, 

and fourth moments about j?, with Sheppard^s corrections. Mi = 
TTi = 0 and Ms == 

N : the number of items in a sample. 

J'l, 3^2, Pz, Pa: lower-case Gr^k nu; respectively, the first, second, third, and 
fourth moments about 
Pij Pa, • * • , Pn- the percentiles. 

TTi, W 2 , Ts, 1 ^ 4 : lower-case G^ek pi ; respectively, the first, second, third, and 
fourth moments about X. tti == 0. 

Q: the semi-interquartile raiige. 

Qh Qh Qs- the quartiles. Q2 = Med. 
s: the standard deviation of a sample, 
the variance of a sample. 

Scor: the standard deviation of a sample, with Sheppard^s correction 
Sk: the Pearsonian measure of skewness. 

SkQ; a measure of skewness based on the quartiles- 

lower-case Greek sigma, ‘^sigma caret^^ or ^^sigma hat,^' estimate of the 
standard deviation of a population. , 

0 : lower-case Greek sigma, the standard deviation of a population. 
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S: upper-case Greek sigma, meaning ^Hake the sum of.’^ 

V : the coefficient of variation. 

x: deviation of X from X, 

X: a value in a series; also, the mid-value of a class in a frequency dis- 
tribution. 

X : the arithmetic mean. In later chapters we shall distinguish between 
the arithmetic mean of a sample, X, and the arithmetic mean of the 
population, 

Xd: a designated mean. 

I |: disregard signs; thus, S|x| means ‘Hake the sum of the x values 
without regard to signs. 



CHAPTER 10 


Dispersion, Skewness, and Kurtosis 


In the preceding chapter we considered certain measures which at- 
tempted to describe the central tendency of a frequency distribution. 

There are other aspects of frequency 
distributions which are also impor- 
tant. First we shall consider the dis- 
persion, or spread of the data. Two 
counties may each show an average 
yield of wheat of 15 bushels to the 
acre; but, if the data are considered 
farm by farm, one county may exhibit 
extreme values ranging from 10 to 20 
Chart lO.l. Two Frequency bushels per acre, while the other may 
Curves Having Different Bisper- yields as low as 5 bushels per 

acre and as high as 25 bushels per 
acre. If such a crude measure of dispersion be used, it is apparent 
that there is greater uniformity of yield in the first county. Chart 10.1 



Chart 10.2. A Curve Skewed to the Right (Solid 
Line) and a Symmetrical Curve (Broken Line). 

shows two symmetrical curves which have the same mean but which differ 
in respect to dispersion. 

If a frequency curve or frequency distribution is not symmetrical, it is 
said to be skewed, or asymmeiricuL Most frequency distributions exhibit 
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more or less skewness. Chart 10.2 shows two curves, one of which is 
symmetrical and one of which is skewed. The skewed curve is skewed to 
the right— the direction in which the excess tail appears. 

Curves of frequency distributions may be symmetrical but may differ 
from each other in regard to the amount of kurtosis present. The basis of 
reference is the normal or mesokurtic curve discussed in Chapter 23. A 
leptokurtic curve has a narrower cen- 
tral portion and higher tails than does 
the normal curve. A comparison of 
these two is shown in Chart 10,3. 

Chart 10.4 shows a platykurtic curve 
and a normal curve. As may be seen, 
the platykurtic curve has a broader 
centra! portion and lower tails. 

MEASURES OF ABSOLUTE 
DISPERSION 

The mean annual temperature at 
Lexington, Kentucky is 55.2 degrees. 

The mean annual temperature at San 
Francisco, California is 55.7 degrees, which is very little different from the 
temperature at Lexington. These two figures do not, hoivever, suffice to 
characterize this aspect of the climatic conditions of the two cities. The 
temperature at Lexington has been known to fall as low as —20 degrees 

and to rise as high as 108 degrees. In 
San Francisco the lowest recorded 
temperature is 20 degrees and the 
highest is 104 degrees. It is quite 
apparent that there is greater vari- 
ability of temperature at Lexington 
than at San Francisco. 

Let us consider a second illustration. 
A buyer for a large department store 
has been offered two types of electric 
lights for use in the store. The sales- 
men each claim about the same aver- 
age length of life for their bulbs. The 
buyer obtains from a testing laboratory test data for 40-watt lamps of the 
two makes and finds that the average life of each of the two kinds of bulbs 
is about 1,000 hours. Examining the data further, however, shows that 
in one batch of bulbs a lamp burned out at 325 hours while one lasted 
1,570 hours. In the other batch one lamp lasted but 105 hours, while one 



Chart 10.4. A Flatykartic Curve 
(Solid Line) and a Norma! or 
Mesokurtic Curve (Brokeu Line). 



Chart 10. S. A Leptokurtic 
Curve (Solid Lme) and a Normal 
or Mesokurtic Curve (Broken 
Line). 
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did not burn out until the expiration of 2,910 hours. This limited infor- 
mation indicates a greater degree of uniformity among lamps of the first 
batch. ^ 

The range. The measurement of dispersion may be made in a crude 
form by referring to the lowest and the highest values, as was done in the 
preceding paragraphs. This is a very simple and easy-to-understand 
measure. The range gives a comprehensive value for the data in that it 
includes the limits within which all of the items occurred. However, the 
range has certain disadvantages. It fails to give ^ixy consideration to the 
arrangement of the values between the two extreme values.^ Further- 
more, the range is misleading if either of the extreme values is an unusual 
occurrence. 

Referring to the cadet-midshipmen^s grades in Table 10.3, it is observed 
that the range is 71.95 (the lower limit of the first class) to 89.95 (the 
upper limit of the last class). If we have the array to refer to, as in 
Table 8,2, the range may be given a little more accurately as 72.1 to 89.6. 
The range from the frequency distribution merely tells us that no one 
in the class received a grade below 71.95 or above 89.95. The range is 
usually stated as the difference between the two extreme values. For the 
cadet-midshipmen, 89.95 — 71.95 = 18.00, However, if only this single 
figure is given, we do not know whether the range is from 0 to 18, or from 
78 to 96, or what the limits may be. 

The 10-90 percentile range. Sometimes we are interested in know- 
ing the range within which a certain proportion of the items fall. One 
such range, which is occasionally used in educational measurement, is the 
10-90 percentile range. This measure excludes the lowest 10 per cent 
and the highest 10 per cent, giving the two values between which the 
central 80 per cent of the items occur. Of course, the 10th percentile is 
the ist decile, and the 90th percentile is the 9th decile. The measure is 
usually referred to, however, as the 10-90 percentile range, rather than 
the 1-9 decile range, since the former carries more clearly the idea of the 
central 80 per cent. 

The 10-90 percentile range is not affected by extreme values as is the 
range. However, this measure has a very serious shortcoming in that it 
does not make use of the values of all the items. As a result, the values 
below the 10th percentile (or above the 90th percentile) could be massed 
closely together or spread out widely; the effect upon the 10-90 percentile 
range would be the same. Also, the values between the 10th percentile 
and the 90th percentile could be arranged in any conceivable manner so 
long as they are somewhere between the 10th and 90th percentiles. 


^ It must be obvious that when^iV «= 2, this difficulty does not exist. It is of minor 
importance for small samples drawn from a normal populations^ 
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The quartile deviation. In Chapter 9 mention was made of Qi 
and Qs, the lower and the upper quartiles. A measure of dispersion 
based upon these values is termed the quartile deviation^ or the semi-iatcr- 
quartile range. It is giverx by 

^ Qz- Qi 

2 * 


If a series is symmetrical, it is clear that and Qz are equidistant from 
the median. Therefore, if we measure ±<3 from the median, we include 
50 per cent of the items of the series, for we have measured back to 
and Qs. If a series is skewed, as is usually true, we may take ± Q around 
the median, and, while we shall not arrive at either Qi or Qz, we may 
expect to include approximately 50 per cent of the items unless the skew- 
ness is great. 

The quartile deviation, like the 10-90 percentile range, is not affected 
by extreme values and also fails to consider the values of all the items. 

The average deviation. The average deviation, or the mean deviation, 
as it is sometimes called, is usually measured in relation to the arithmetic 
mean. The average deviation is obtained by taking the sum of the devi- 
ations of the items from the arithmetic mean, without regard to signs, 
and dividing by the number of items. It will be recalled that Sa; = 0 
and it is for this reason that the signs of the various x values are neg- 
lected. Thus, 


AD 


_ 

N ’ 


or, for a frequency distribution, 

AD - 



where j | means that the signs are neglected. Because the sum of the 
deviations (signs neglected) is a minimum when taken around the median, 
the mean deviation is sometimes computed in relation to the median. 
In practice, however, the mean is generally used and, if the series is sym- 
metrical, the resulting AD is the same. Since AD is of limited useful- 
ness compared to the measure of dispersion next discussed, the computa- 
tion of AD is not shown here. The determination of AD for a frequency 
distribution is illustrated in the first edition of this book on pages 236 
and 239. 

If a distribution is normal, 57.5 per cent of the items are included 
within the range of X ± AD. If the distribution is moderately skewed, 
this ‘will be found to be approximately true. 

The standard deviation, nngronped data. Instead of merely 
neglecting the signs of the deviations from'*the arithmetic mean, we may 
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square the deviations, thereby making all of them positive, 
may have a measure 


N ’ 


Thus, we 


the variance or mean square deviation. (At a later point we shall use the 
term variation to refer to Sa:^) s* is also known as the second moment, 
T2, of the distribution, since the deviations have been raised to the second 
power. We shall make use of the variance in later sections of the book. 

' At this point we are interested in the square root of this measure, 

A ’ 

TABLE 10.1 

Computation of Standard Deviation for 
Scores of 1$ Persons in Recalling Trade 
Names of Advertised Products 
by Use of the Expression 

$ = 




Subject 

Score X 

X 


1 

12 

-20.87 

435.56 

2 

21 

-11.87 

140.90 

3 

21 

-11.87 

140.90 

4 

23 

- 9.87 

97.42 

6 

27 

- 5.87 

34.46 

6 

28 

- 4.87 

23.72 

7 

30 

- 2.87 

8.24 

8 

34 

1.13 

1.28 

9 

37 

4,13 

17.06 

10 ! 

39 

6.13 

37.58 

11 

39 

6.13 

37.68 

12 

39 

6.13 

37.58 

13 

40 

7.13 

50.84 

14 

49 

16.13 

260.18 

15 

54 

21.13 

446.48 

Total 

493 


1,769.78 


Data from S. M, Newhall and M. H, Heim, “Memory 
Value of Absolute Size in Magazine Advertising/* Journal 
0/ Applied Psychology^ Vol. 13, 1929, pp. 62-75. The 
above data were for advertisements of 150 square inches 
each, and each was observed for 6 seconds. The maximum 
possible score was 81, 


1 « 


m 

w 


32,87, 


Vf-’ - - Vht; 


98 


lo.a 
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wMch is termed the standard deviation or, occa.sionally, the root-meaii- 
square deviation. It has been pointed out previous!}^ that Sx- is a mini- 
mum when taken around the arithmetic mean." Therefore, the standard 
deviation is always computed in reference to the arithmetic mean. As 
the above expression indicates, the steps involved in computing s are : 

(1) Determine the deviation x of each item from X; 

(2) Square these deviations; 

(3) Total them; 

(4) Divide this sum by iV; 

(o) Take the square root. 

The computation of $ for a series of ungrouped data is shown in Table 
10.1. This procedure involves the computation of x for every item, and 
would be a rather laborious procedure if there were an appreciably larger 
number of items. The value of s may be obtained, without computing 
each Xf by means of the expression^ 



The computation of s by this shorter method is illustrated in Table 


10.2. Notice that the correction 


. /SX\^ . 


is subtracted. This is always 


true. The sum of the squared deviations is least when taken around X, 
We, however, took our deviations around some other value (0, in this 
instance), and these squared deviations are therefore too large. 

Referring to Table 10.1, it will be observed that the value of ^ was 
rounded to two decimals, and thus each value of x and is an approxi- 
mation If X and X are shown to sufficient digits, results by the two 
methods will be the same. Here, both methods yield 10.9. 

At this point it may be well to note that s measures the dispersion in 
the sample. In Chapter 24 we shall discuss cr, the population standard 
deviation, and an estimate of the population standard deviation based 
upon a sample. 

The standardi deviation^ grouped data. Before considering the 
properties of let us see how to compute s for a frequency distribution. 
Since frequencies are present, 



® For a demonstration, see Appendix S, section 10. L 
® For proof of this expression, see Appendix S, section 10.2. 
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where z now represents the deviation of a class mid-value from the mean. 
Table 10.3 illustrates the computation of s for the cadet-midshipmen’s 
grades. It is fairly obvious that this method, involving the determina- 
tion of a number of x values, is cumbersome. 

TABLE 10.2 

Computation of Standard Devia- 
tion for Scores of IS Persons in 
Recalling Trade Names of 
Advertised Products by 
Use of the Expres- 
sion 





5 =: 


N ) 

Subject 

1 Score A" 


1 

12 : 

144 

2 

21 

441 

3 

21 

441 

4 

23 

529- 

5 

27 

729 

6 

28 

784 

7 

30 

900 

8 

34 

1,156 

9 

37 

1,369 

10 

39 

1,521 

11 

39 

1,521 

12 

39 

1,521 

13 

40 

1,600 

14 

49 

2,401 

15 

54 

2,916 

Total 

493 

17,973 

Data from same source os Table 10,1. 

./sx* 1 

= -1 


117,973 /493' 


\ N ) ~ ' 

15 \ 15 

= Vl. 198.20 

- 1,080.22 : 

= V 117.98 


= 10.9. 

A short method for s is available which allows us to take the mid-value 
of any class as the assumed mean, work with deviations around this value, 
and make the necessary correction. The expression is 



To further shorten the process, the deviations are taken in terms of 
classes, giving^ 


^ For demonstration, see Appendk S, section 10.2. 
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s 



where d' indicates the deviation of a class mid-value from the assumed 
mean in terms of classes and i is the class interval It is of interest to 

note that the correction factor m is the square of the correction fac- 


tor used in computing the arithmetic mean by the short method. Tho 
computation of s by this shorter procedure is shown in Table 10.4. 


TABLE 10.3 

Computation of the Standard Deviation for Grades of the 1952 Graduating 
Class of the United States Merchant Marine Academy by Use of the 

Expression 



] 

Grade 

Number of 
cadet- 
midshipmen 
/ 

Mid-values 
of classes 

X 

x^X-l 

\ 


72.0-73.9 

7 

72.95 

-6.66 

44.3556 ; 

310.4892 

74,0-75.9 

31 

74.95 

-4.66 

21.7156 : 

673.1836 

76.0-77.9 

42 

76.95 

-2.66 

7.0756 

297.1752 

78 0-79.9 

54 

78.95 

-0.66 

0,4356 

23 5224 

80 0-81.9 

33 i 

80.95 

+1.34 

1.7956 

59.2548 

82.0-83.9 

24 

82.95 

+3.34 

11.1556 

267.7344 

84.0-85.9 

22 

84.95 

+5.34 

1 28.5156 

627.3432 

86.0-87.9 

8 

86.95 

+7.34 

1 53.8756 1 

431.0048 

88.0-89.9 

4 

88.95 

+9.34 

1 87.2356 

348 9424 

Total 

225 



i ... i 3,038.6500 


S = 79.61. 

Properties of the standard deviation* Of the various measures of 
absolute dispersion which have been mentioned, the standard deviation 
(and its square, the variance) is by far the most important. It will be 
used in connection with various statistical methods described hereafter. 
One important consideration is that it is one of the factors involved in 
the equation for the normal curve and for various skewed curves, dis- 
cussed in Chapter 23. It is also used in testing the reliability of certain 
statistical measures, in correlation, and in connection with business cycle 
analysis. 

The standard deviation is the most frequently used measure of the 
spread of a series of data. If ±a is measured from the arithmetic mem 
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of a normal distribution^ 68.27 per cent of the items are includedj^withiii 
the range of X ± 2Sj 95.45 per cent are included; and within X + 3s, 
99.73 per c^nt,^ or nearly all, of the items are included. Chart 10,5 illus- 
trates what has just been said. The percentages Just given refer to a 
normal curve. If the distribution is skewed, these percentages will be 
only approximately realized. For the cadet-midshipmen^s grades (Table 
10.4), I ± sis 79.61 ± 3.67 = 75.94 and 83.28. To ascertain the pro- 
portion of cadet-midshipmen in Table 10.4 who fall between 75.94 and 
83.28, we first determine the number occurring between 75.94 and 75.95 

TABLE 10.4 

Computation of the Standard Deviation for Grades 
of the 1952 Graduating Class of the United 
States Merchant Marine Academy by Use of 
the Expression 



Grade 

Number of 
cadet- 

midshipmen 

/ 

d' 

fd' 

fidr 

72.0-73.9 

7 

~3 

-21 

63 

74.0-75.9 

31 

1 -2 ^ 

-62 

124 

76.0-77,9 

42 

: -1 ^ 

-42 

42 

78.0-79.9 

54 




80.0-81.9 

33 

+1 

4-33 ’ 

33 

82.0-83.9 

24 

+2 

4-48 

96 

84.0-85.9 

22 

+3 

+66 

198 

86.0-87.9 

. 8 

+4 

+32 

128 

88.0-89.9 

4 

+6 

+20 

100 

Total 

225 1 


+74 

784 



= 2 V3.3763 = 2(1.837), 

« 3.67. 

(the upper liiuit of the second class), which is 0.2; then we include ail of 
the frequencies in the next three classes, after which we compute the 
number between 81.95 (the lower limit of the sixth class) and 83.28, 
which is 16.0. The total is 145.2, or 64.5 per cent. Within X ± 2s 
(that is, from 72.27 to 86.95), we find 215.9, or 96.0 per cent of the grades. 
Within S ± Ss (68.60 to 90.62), all of the 225 grades are included. 


‘ See Appendix E, wMcli gives the areas in one-half of the central portion of the 
normal curve. More exactly, 68.27 is twice 34.1344:7: 95.45 is twice 47.72499; 99.73 
is twice 49.861501. 
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Chart 10.5. Proportion of Items Incimded within ±ls, ± 2s, and ±3softIie 
Arithmetic I^Iean in a Normal Curve. 

In dealing with the normal curve in later chapters^ we shall not con- 
fine ourselves to the proportionate areas included within ±a, ±2a, and 
±3s of the mean, but shall consider an^^ desired multiples of a. For 
example, we shall later be interested in knowing that 95 per cent of the 
items may be found within X ± 1.96s and that 99 per cent may occur 
within X ± 2.58s. Actually, we shall be more interested in the propor- 
tions occurring beyond the limits mentioned, that is, 5 per cent and 1 per 
cent. 


Before leaving the topic of absolute dispersion, it may be of interest to 
point out that, for any series of values, no matter how they are dis- 
tributed, it may be showm by Tcliebychefi's inequality, that the propor- 
tion of the values lying within the limits of Z ± Ms (where the value of 

M is greater than one) will be more than 1 — and that tlie pro- 

portion falling beyond the limits of A ± Ms will be less than-^- If a 

distribution is uniinodal, and if the difference bet%veen the mode and 
the mean does not exceed a, the Camp-Meideii inequality states that 

more than 1 — — of the values are within X :t Ms and that less 
2.25M® 

than of the values lie beyond X ±. Ms. 

2.25M'‘ 

The greater the dispersion of a series, the greater the value of s. As 
a measure of umformity of the characteristic measured, the smaller the 
value of 5, the greater the uniformity. To avoid this inverse relation- 




222 


DISPERSION, SKEWNESS, AND KURTOSIS [Chap. lO 


ship, a modification referred to as a measure of precision is sometimes 
used, especially with reference to the precision of a series of physical 
measurements. This measure is 



It is not often used in statistical work in the social sciences. 

MEASURES OF RELATIVE DISPERSION 

In the preceding paragraphs we have discussed measures of absolute 
dispersion, all of which are expressed in terms of the units of the problem, 
which may be dollars, pounds, inches, percentages, and so forth. When 
we wish to compare the dispersions of two or more series, it may or may 
not be desirable to use such a measure. The comparison of dispersions 
of two or more series resolves itself into three possible situations: 

(1) The series to be compared may be expressed in the same units, 
and the means may be the same, or nearly the same, in size. The grades 
of the cadet-midshipmen showed a mean of 79.61 and a standard devia- 
tion of 3.67. If another graduating class showed X = 79.55 and s = 
3.50, it is clear that the second class would exhibit less dispersion. 

(2) The series to be compared may be expressed in the same units, 
but the arithmetic means may differ. Some years ago the Goodyear Tire 
and Rubber Company developed a new type of cord for automobile tires 
which was designated Supertwist. The Supertwist cord was superior 
to ordinary cord in that it could stretch more and had a longer flex life. 
Tests made on cord as received from the cotton mill and prior to fabrica- 
tion into tires showed for the flex life of Supertwist cord 

X — 138.64 minutes, and s = 15.27 minutes; 

while for regular cord the figures were 

X — 87.66 minutes, and s = 14.12 minutes. 

If we compare the two s values, it appears that Supertwist cord is more 
variable in respect to flex life than is regular cord. However, it must be 
noted that the average flex life of Supertwist is much greater than that of 
regular cord. Taking this factor into consideration, we may set up a 
measure of rotative diepersiorif 



This is the coeflScient of variation and is usually expressed as a percent- 
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V — rr^rr: == 0.1101, or 11.0 per cent; 
loo. 54 

while for regular cord 

y — = 0.1611, or 16.1 per cent. 

8/. 66 

It is thus apparent that the relative variation in flex life is much less for 
Supertwist cord than for regular cord. 

Chart 10.6 also illustrates the comparison of dispersions of two series 
having different mean values. Section A shows the curves of two dis- 
tributions having the same absolute dispersions but different relative 
dispersions. In section B are curves of two distributions having quite 
different absolute dispersions but the same relative dispersions. If the 
zero is shown on the horizontal scale, as in Chart 10.6, a very rough vis- 
ual impression may be had of the relative dispersion of a series. For this 
reason some statisticians think it is desirable to show the zero on the hor-* 
izontal scale. This does not seem to be a very important matter, how- 
ever, since relative dispersion can at best be visualized only approximately. 
Occasionally frequency distributions are formed with class intervals 
expressed, not in terms of original units, but as percentages of the mean, 
the interval being some convenient figure, such as 10 per cent of the 
mean. If two such distributions are plotted on one chart, it is easy to 
compare visually their relative dispersions. 

(3) The series to be compared may be exprevssed in different units. In 
such a case the standard deviations cannot be directly compared. A 
study of a large number of male industrial workers® revealed an average 
pulse rate of 81.1 beats per minute and a standard deviation of about 12.2 
beats per minute. Measurements of height showed X == 66.9 inches and 
s = 2.7 inches. The measurements of height included a small number of 
men not measured as to pulse rate. Let us disregard this difficulty for 
the purposes of our illustration. Are the industrial workers more variable 
in respect to pulse rate or height? It is obvious that the two standard 
deviations, being in different units, cannot be compared. Computing the 
two coefficients of variation shows, for pulse rate, 

12.2 

y ^ - 0.149, or 14.9 per cent, 

oi.l 


® Based on data in *4 Health Study of Ten Thousand Male Industrial Workers^ pp, 45^ 
and 59, United States Public Health Service, Public Health Bulletin, 162, 
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and, for height, 

2.7 

Y ^ ^ 0.040, or 4.0 per cent. 

66.9 

It is clear that, for this group of men, pulse rate is subject to greater dis- 
persion than is height. 


f REQUEINC lES 



FREQUENCIES 



& 

Chart 10.6. Comparisons of Dispersions of Series Having Different 
Arithmetic Means. A. Same absolute dispersion, different relative dis- 
persion ^left-hand curve, ^ 4* 33,.s « 10, V « 30.3 per cent; right-hand 
curve, X = 101, s « 10, F «= 9.9 per cent, B. Different absolute disper- 
sion, same relative dispmion: left-hand curve, X = 50, s « 5, F = 10 per 
cent; right-hand curve, = 100, s = 10, F = 10 per cent. (Sections A and 
B have different vertical scales since they are not intended to be compared. 
However, if the vertical scale of section B is expanded 50 per cent, all curves 
will have the same area.) 

Somewhat akin to our measurement of relative dispersion is the possi- 
bility of expressing a given value in terms of its divergence from the mean 
and also in terms of the dispersion of the series. Such a procedure is not 
especially useful when we are considering only one value or comparing 
two values from the same series. Its usefulness becomes apparent when 
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we want to compare two values from different series and when those two 
series (1) differ in respect to X or s, or both, or (2) are expressed in differ- 
ent units. Suppose that a certain student has made a grade of 180 on 
an intelligence test, and that his group showed X = 160 and s = 15. 
This same student made a grade of 86 in history, and the group showed 
= 70 and s = 12. We are interested in knowing whether his relative 
standing is higher in the intelligence test or in history. In the intelligence 
test he was 20 points above the mean, and in history he was 16 points 
above the mean. These deviations, however, are not comparable, but 
may be rendered so by dividing by their respective standard deviations. 
Thus, 


Intelligence test: 
History: 


X - I 
$ 

X - I 

s 


180 - 160 
15 

86 - 70 _ 
12 


- 

15 

±16 _ 
12 


= +1.33: 
+ 1.33. 


It is apparent that the student shows the same relative standing in history 
and on the intelligence test, being + 1.33s above the mean in each. The 
usefulness of this device is by no means limited to the educational field. 
It is, however, often used with test data and is then referred to as a 
‘‘standard score.^^ 


SKEWNESS 

When a series is not symmetrical, it is said to be as 3 rmmetrical or 
skewed. In Chart 10.2 a skewed curve was shown in relation to a sym- 
metrical one. The curve of cadet-midshipmen’s grades (Chart 10.7) is 
also skewed. Measures of skewness indicate not only the amount of 
skewness but also the direction. A series is said to be skewed in the 
direction of the extreme values, or, speaking in terms of the curve, in 
the direction of the excess tail. Thus the two curves referred to above 
are both skewed positively, or to the right. Most skewed curves encoun- 
tered in the social sciences are skewed to the right. Only rarely do we 
find curves skewed to the left, as in Chart 10.8, and even more rarely do 
we find data characteristically skewed to the left. 

Many series, however, are characteristically skewed to the right. 
Examples are frequency distributions of wages or salaries, use of elec- 
tricity (see Chart 23.13), weights of adult male human beings, and 
numerous other variables. Distributions of grades are apt to be mod- 
erately skewed to the right, or nearly symmetrical. In the case of the 
cadet-midshipmen’s grades, the skewness is partly due to the fact that 
we are considering only those men who had survived the previous three 
years, during which some of the less able had been dropped. The die* 
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NUMBER OF 
CADET-MIDSHIPMEN 



Chart 10.7. Location of Arithmetic Mean, Median, and Mode for Grades 
of the 1952 Graduating Class of the United States Merchant Marine 
Academy. 


NUMBER OF 
INVENTORS 



AGE IN YEARS 

Chart 10.8. Age at Death of 371 American Inventors. Data 
from Bio-Social Characteristics of American Inventors/’ by Sanford Win- 
stoni American Sociological Eemem^ VoL 2^ No. 6, pp. 837-“849. 
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tributioii of ages at death of the American inventors iii Chart 10.8 may 
be characteristically skewed to the left, since younger men do not often 
have enough inventions to their credit to be classified as 'fin venters/*’ or 
the skewness may be due to the fact that a time factor is present— almost 
one-fifth of the inventors included in this study were born before 1800. 

Pearsonian measure of skewness. It was pointed out in the pre- 
ceding chapter that the mode is not influenced the presence of extreme 
values, the median is influenced by their position only, and the aritiimefic 
mean is influenced by the size of the extremes. Consequently we eoiiJd 
make use of the mode and the mean to measure skeivncss. We might 
say, then, that skewness =:=»fmiean — mode. But there are some short- 
comings of such a measure. In the first place, being a measure of abso- 
lute skewness, it would be in terms of the units of the problem. Fur- 
thermore, it would have much different meaning for a series of small 
dispersion than for a widely?* dispersed series. Statisticians almost never 
use a measure of absolute skewness, preferring a measure of relative 
skewness. The measure just mentioned may be put into relative terms 
and the two difficulties overcome by dividing by Now 

Z -- Mo 

Skewness 

s 

This gives us a relative measure with positive sign when skewmess is to 
the right, and with negative sign wThen skewness is to the left. There is, 
how^ever, another important difficulty growing out of the fact that the 
mode for most frequency distributions is only an approximation. The 
median may be more satisfactorily located, and therefore we use the 
measure^ 

3(Z - Med) 


In the preceding chapter it was found that X = 79.61 and Med — 79.15 
for the cadet-midshipmeii^s grades. In this chapter the value of s was 
ascertained to be 3.67. The skewness, then, is 


Sk = 


3(79.61 - 79.15) 
3.67 


+0.376. 


^The presence of tke 3 in the expression is explained as follows: I\ari Pearson 
showed empirically that, in moderately skewed distributions of a continuous variable, 
the median tends to fail about J of the distance from the mode toward the mean. 
Consequently he wrote Mo » .S — 3 (Z — Med) and, substituting this expression 
lor the mode in the measure of skewness, obtained 

_ 1 - [1 - 3(1 - Med)| 3(1 Med) 
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TABLE 10.5 

Computation of Various Measures for Age at Death of 371 Ameri^ 

can Inventors 


Age at death m years | 

/ 

d' \ 

fd’ 

Kd'Y 

Kd’Y 

35 and under 

40 

3 

-6 

-18 

108 

-648 

40 and under 

45 

6 

-5 

-30 

150 

-750 

46 and under 

50 

12 

-4 

-48 

192 

-768 

60 and under 

55 

16 

-3 

-48 

144 

-432 

65 and under 

60 

26 

—2 

-52 

104 

-208 

60 and under 

65 

40 

-1 

-40 

40 

- 40 

65 and under 

70 

50 

0 

0 

0 

0 

70 and under 

75 

56 

1 

56 

56 

56 

76 and under 

80 

62 

2 

124 

248 

496 

80 and under 

85 

55 

3 

165 

495 

1,485 

85 and under 

90 

25 

4 

100 

400 

1,600 

90 and under 

95 

17 

6 1 

85 

425 

2,125 

95 and under 100 

2 

6 

12 

72 

432 

100 and over* 


1 

7 

7 1 

49 

343 

Total 


371 


+313 

2,483 

+3,691 


* This class assumed to have its mid-value at 102.5. 

Data from Sanford Winston, "Bio-social Characteristics of American Inventors," 
American Sociological BevieWt Vol, 2, No. 6, p. 848, and by correspondence. 


N 


« 185.5. 


32 5 

Med « 70 + X 5 « 72.90 years, 
oo 


67.6 + HI X 6 =■ 71.72 years 


years. 



2/<i' 

+313 „ 

Vi ^ 

N ~ 

S3S 0. 

371 


mdT 

2,483 

V2 « 

N 

371 



+3,691 

Vz « 

iV 

371 

TTi 

0. 



6.692722. 

9.948787. 


« 1^2 - = 6.692722 - (0.843666)3 - 5.980950. 

ira » >^3 - 3*^1 >^2 + 2pI « +9.948787 - 3 (0,843666) (6.692722) + 2(0.843666)® 
«« -5.789483. 


This may be considered as a moderate degree of skewness, since the meas- 
ure varies within the limits® of ±3. It should be added that values as 
large as ± 1 are rather unusual. 

For the data of age at death of the American inventors, it is shown 


® Harold Hotelling and Leonard M. Solomons (‘‘The Limits of a Measure of Skew- 
ness/* Anmls of Mathemaiical StcdiBtimj, May 1932, pp. 141-142) have shown that 
I - Med 

* — ^ — lies between ± h 
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under Table 10.5 that X — 71.72 years, while Med = 72.90 years and 
s = 12.23 years. The Pearsonian measure of skewness is 


gl, ■■■ 3(71.72 - 72.90) 

““ 12.23 ~ 


Measures of skewness based on qnartiles and percentiles. 
Skewness may also be measured by means of the quartile measure of 
skewness, 


{Qz ^ Med) - (Med - QQ ^ Qi ^ Qz - 2]\Ied 
Qi $3 ““ Qi 


and by use of an expression employing the 10th and 90th percentiles, 

(P90 — Med) — (Med — Pip) _ Pio + Pm — 2Med 
P90 PlO P§0 P 


Since these measures suffer from shortcomings similar to those previously 
mentioned for measures of dispersion based on quartiles and percentiles, 
they are not altogether satisfactory measures of skewness, and no further 
consideration will be given to them here. 

Measure of skewness based on the third moment. We have seen 
that the most satisfactory measure of dispersion is the standard deviation, 
which is based upon the second moment about the mean 


TTs = 


IT’ 


and s = 



A measure of skewness may be obtained by making use of the third 
moment about the mean , 


Ts 


8 


N 


It will be recalled that the first moment about the mean, 


is always aero. However, the third moment about the mean is not zero 
unless the distribution is symmetrical about the mean. Cubing a devia- 
tion does not change its sign. It does, however, have a disproportionately 
large effect on large deviations. As illustrations, consider the two sets of 
data given in Tables 10.6 and 10.7, the first of which is symmetrical 
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around a mean of 6j while the second is not symmetrical around a mean of 
6. Both sets of data have 


TTl 


Xx 

Y 


= 0, 


and the data of Table 10.6 have 


Ts 


lY 


= 0 . 


But the figures in Table 10.7 show 


^3 


N 


= + 6 . 


TABLE 10.6 


Compulation of First and 
Third Moments of a 
Symmetrical Series 


X 

X 


2 

-4 

-C4 

4 

~2 

-- 8 

6 

0 

0 

8 


+ 8 

10 

+4 

±64 


0 

0 




6 A 

Ti = 

N 

= _=o. 



-2-0 





A" 

5 


TABLE 10.7 


Computation of First and 
Third Moments of an 
Asymmetrical Series 


X 


X 


3 


-3 

~27 

4 


-2 

- 8 

6 


0 

0 

7 


+ 1 

+ 1 

10 


+4 

+64 



0 

+^ 



- ^ - 0 


Ti 




N 

”” 5 




_ +30 _ 

+6. 

TTs 

N 

5 


To compute the third moment of a frequency distribution, 


TTS = 


2/a;^ 


N 




taking the actual deviations from the arithmetic mean, cubing them, 
miiltipl 3 dng by the frequencies, summing, and dividing by iV, would be 
laborious. As shown in Appendix S, section 10.2, the second moment, s- 
or can be obtained by a short process. In terms of class intervals 
squared, 




jv '~\nJ' 


The value of the third moment (in terms of class intervals raised to the 
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third power) is given by® 


N 


Or, letting Vi = 




g ^ mdr 2 


N 

N 


(¥)■ 


and Pz 


S/iVf)' 
N ' 


and 


7r2 == P2 — Pli 


TTg = 


Obviously^ Tz is a measure of absolute skewness, 
live skewness is 


lSx = 


ttI 

ITg 


llie measure of rehi- 


where both numerator and denominator are in terms of class intervals 
raised to the sixth power. Skewness is also sometimes measured by 
azi w^here^® 


as = = 


rs 



az may be given the sign accompanying ttz. We shall make use of az in 
fitting a skewed curve in Chapter 23. 

The values of the second and third moments for the data of cadet- 
midshipmen’s grades are shown below Table 10.8. From these we obtain 


EI (2>642Q53) 
irl (3.376276)® 


Similarly, the second and third moments for the age at death of the 
American inventors have been computed in Table 10.5. From these we 
obtain 


^ (-5-78Q483)- 

■“ (5.980950)® 


0.16. 


® See Appendix S, section 10.3. 

No previous mention has been made of ai or as. For any series of figures, 


ai 


JLl 

V^2 


= 0; 


ITS 



* L. 


as « 
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Since Ts = 0 when no skewness is present, it follows that a perfectlj^^ 
symmetrical series will have jSi = 0. The greater the value of /3i, the 
more skewness there is in a series. At this point we are not in a position 
to say whether either of the two values just given for is significantly 
greater than aero. We shall consider this problem in Chapter 26. 

TABLE 10.8 

Computation of tlie First Three Moments for Grades of the 
1952 Graduating Class of the United States Merchant Marine 
Academy 


Grade 

Number of 
cadet- 
midshipmen 

f 

d' 

fd' 

f(.dr 

f(.dy 

72.0-73.9 

7 

-3 

-21 ^ 

63 


189 

74.0-75.9 

31 

-2 

-62 

124 

— 

248 

76.0-77.9 

42 

-1 

-42 

42 


42 

78.0-79.9 

54 

0 





80.0-81.9 

33 

+1 

+33 

33 

+ 

33 

82.0-83.9 

24 

+2 

+48 

96 

+ 

192 

84.0-85 .9 

22 

+3 

+66 

198 

+ 

694 

86.0-87.9 

8 

+4 

+32 

128 

1 -{- 

612 

88.0-89.9 

1 4 

+5 

+20 

100 

' + 

600 

Total 

225 


+74 

784 

+1,352 


Vi 


Vi 


Vi 


Zfd' 

ssa 

N 

N 

mdr 

N 


+74 


225 

225 
_ +1,352 
225 


« +0.328889. 


3.484444. 


+6.008889. 


TTi » 0 . 

^ Pi - pI ^ 3.484444 - (0.328889)* - 3.376276. 
wz =*= pz — 3 lull's + 2plf 


« 6.008889 - 3 (0.328889) (3.484444) + 2(0.328889)3, 
« 2.642053. 


KURTOSIS 

Chart 10.9 shows a leptokuriic distribution. A platyhurtic distribution 
is shown in Chart 10.10. The normal curve is designated mmmokurtic.^^ 
The degree of kurtosis present in a series may be measured by making use 
of the fourth moment, 


« humpbacked; thus^ humped or unimodaL Lepto ^ slender, narrow 
PMy « broad, wide, flat. Meso » in the middle, intomediate. 
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NUMBER 
OF HOUSES 



Chart 10.9. Cost of New Five-Room House and Lot to Pur^aser, 
(Solid Line) and Normal Curve (Broken Line) Having Same iV, X, and 
», Cleveland, 1924. Based on data of Table 10.9. 

or, for a frequency distribution, 


AT ’ 


By a procedure similar to that given in Appendix S, section 10.3, it may 
be shown that 


^4 


md'Y . s#' md’Y . „ vid’Y- „ 


„ 

* N 

Ti ’= Vi — 4ViJ'8 + Sj'jI'j — %v\. 


or letting 
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Now T 4 gives an absolute expression for kurtosis. This may be put 
into relative terms by dividing by The measure is known as /Ss or 
and 


jSs = Oi4 


El 


where both numerator and denominator are in terms of class intervals 
raised to the fourth power. This expression has a value of 3.0 for the 


PERCENTAGE 

FREQUENCIES 



Chart 10,10. Length of Life of a Group of 
Electric Lamps (Solid Line) ancb Norma! Curve 
(Broken Line) Having Same iV, X, and s. Based 
on data of Table 10.10. The tails of the normal curve 
are not shown. The left tail would cross the Y axis. 


normal curve. For a platykurtic curve, /S 2 < 3.0. For a leptokurtic 
curve, 182 > 3.0. 

The leptokurtic curve of Chart 10.9 is shown in comparison with a nor- 
mal curve having the same N, X, and s. In Table 10.9 the moments of 
this distribution have been computed and 02 = 4.46. 

The platykurtic curve in Chart 10.10 is also shown in relation to a 
normal curve having the same N, X, and s. The moments of the platy- 
kurtic series are shown in Table 10.10, and from these 0% is found to be 
2 . 22 , 
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TABLE 10.9 

Computation of First Four Moments and of for Cost of New 
5-Room Wood House and Lot to Purchaser, Cleveland, 

1924 


Cost 

(mid-values) 

f 

d’ 

fd' 

Kd'p 

Sidy 

Sidy 

1 1,500 

2 

-5 

-10 

50 

-250 

1,250 

2,500 

i 

-4 

- 4 

16 

- 64 

256 

3,500 

2 

-3 

- 6 

18 

- 54 

162 

4,500 

6 

-2 

-12 

: 24 

- 48 

96 

5,500 

16 

-1 

-16 

16 

- 16 

16 

6,500 

27 

0 

0 

0 

0 

0 

7,500 

16 

■ 1 

16 

16 

16 

16 

8,500 

7 

2 

14 

28 

56 

112 

0,500 

3 

3 

9 

27 

81 

243 

10,500 

1 

4 

4 

16 

64 

256 

11,500 

1 1 

5 

5 

25 

125 

625 

Total 

82 ! . i 

0 

236 

- 90 

3,032 


Data from Frank R. Garfield and William M. Hood, “Construction Costs and Real 
Property Value©,*’ Journal of the American Statisttcal AsmciaHon, VoL 32, No. 200, 
December 1937, p. 647. Data are those shown in Chart I for 5-room wood houses. 


Pj 






P4 


Z/d' 

0 

ni 

~W “ 

82 

— u. 

mdy 


236 

N 

— 

82 * 

Z/(rf')5 


-90 

N 


82 

'Zfidy 


3,032 

N 


82 


2.878049. 

-1.097501. 
- 36.975601. 


ITi 

r$ 

Ti 

Ti 

pi 


= 0 . 

= „ - pf = 2.878049, 

= V, - 3v,v, + 2vJ = - 1.097561. 

= V. - 4vt!'i + erfvt - 3v} = 36.975601. 
IT, 36.97500] 

wl (2.878049)* 


NoTffii The assumed mean (S6,500) and the mean eoindde, resulting in a value of 0 
lor Pi. There are therefore no differences between the P and t values, since i'? *»* 0, 
sx 0, Pi « 0, pjPj •» 0, etc. 

When a deviation is raised to a fourth or a second power, its sign 
becomes positive. The fourth power increases extreme deviations dispro- 
tortioiiately in comparison with raising them to the second power. Con- 
sequentty the narrower the shoulders of a distribution and the longer the 
tails, the greater will be in relation to tt?. 

In Chapter 26 we shall consider a method of ascertaining whether a 
value of Pt is significantly less than or greater than 3.0,, 
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TABLE 10.10 

Computation of First Four Moments and of 02 for Length of Life of a Group 

of Electric Lamps 


- — — ** 

Length of life 
in hours 
(mid-values) 

Percentage 

frequencies 

/ 

d' 

fd’ 

Kdr 

Sidy 

Sidy 

50 

1.0 

-9 

- 9.0 

81.0 

- 729.0 

6,561.0 

150 

1.5 

-8 

-12.0 

96.0 

- 768.0 1 

6,144.0 

250 

3.1 

-7 

-21.7 

151.9 

-1,063.3 

7,443.1 

350 

4.4 

-6 

-26.4 

158.4 

- 950.4 1 

5,702.4 

460 

6.0 

-5 

-25 0 

125.0 

- 625 0 j 

3,125 0 

550 

6.7 : 

-4 

-22 8 

91.2 

- 364.8 

1,459.2 

650 

6.6 i 

-3 

-19 8 

59.4 

- 178.2 

534.6 

750 

7.3 1 

-2 

-14.6 

29.2 

- 58.4 

116 8 

850 

7 6 

-1 

- 7.6 

7.6 

7.6 

7.6 

950 

7.8 i 

0 

0 

0 

0 

0 

1050 

7.8 1 

1 

7.8 

7.8 

7.8 

7.8 

1150 

7 6 

2 

16.2 

30.4 

60.8 

121.6 

1250 

7.3 

3 

21.9 

65.7 

197.1 

591 3 

1350 

6.6 

4 

26.4 

105.6 

422.4 

1,689.6 

1450 

5.7 

5 

28.5 

142.5 

712.5 

3,562.6 

1550 

5.0 

6 

30.0 

180.0 

1,080 0 

6,480.0 

1650 

4.4 

7 

30.8 

215.6 

1,509.2 

10,564.4 

1760 

3,1 

8 

24.8 

198.4 

1,587.2 

12,697.6 

1850 

1.5 

9 

13.5 

121.5 

1,093.5 

9,841.5 

1950 

1.0 

10 

10.0 

100.0 

1,000 0 

10,000 0 

Total 

100.0 


+50.0 

1,967.2 

+2,925.8 

86,650.0 


Data from Robley Winfrey and Edwin B. Kurtz, Life Characteristics of Physical Property, Bulletin 103, 
Iowa Engineering Experiment Station, p. 58, Property Group 28~2. 


Vi « 


Vt 




Vi 


ri 

Vt 


N 

N 

mdv 

N 

mdr 

N 

0 . 


+50 


« +0.50. 


100,0 
1,967,2 
100.0 

___ +2,925.8 
100.0 
86,650.0 
100.0 ' 


19.672. 


+29.258. 


866,600. 


Vt -pI ^ 19,672 -- (0.50)8 
TT* w Ps — ZviVt + 2vl « 29.258 
— iviPt + ^vlvt — 

» 866.500 "" 4(0.60) (29.258) + 6(0.50)8(19.672) 
« 837.3045. 

^ 837,3045 

^ (19.422)* ’ 


19.422. 

3(0.50)(19.672) + 2(0.50)« 


3(0.50)^ 


2 . 22 . 



Chap. 10] DISPERSION, SKEWNESS, AND KURTOSIS 


237 


CORRECTION OF THE MOMENTS FOR GROUPING ERROR 

In computing the mean, (or s), tts, and'r 4 for frequency distributions, 
we made use of the mid- values of the classes as representaMve values. 
We saw, in the previous chapter, that the mid-values were incorrect 
assumptions but that the errors present tend to offset each other when we 
compute the arithmetic mean. This offsetting is also present when the 
third moment is computed. It will be remembered that the mid-values 
of the classes preceding the modal class tend to be too small, while the 
mid-values of the classes following the modal class tend to be too la|;ge. 
The result is that the various x values tend to be slightly larger (in abso- 
lute value) than they should be, and no offsetting occurs when they are 
squared or raised to the fourth povrer. Consequently the value of irt 
(and s) and the value of 7r4 are apt to be slightly larger than the values 
computed from the same data ungrouped. Sheppard^s corrections 
attempt to offset this upward bias. The corrected moments are indi- 
cated by ju and are:^^ 

JXl ^ TTi ^ 0 , 

11 % — 

Ms ^ 

/X4 = 7r4 — 1^2 +• 

where all computations are in terms of class intervals. 

If we w^ere to use the class means instead of the class mid-values, the 
arithmetic mean could be computed accurately. However, if class means 
were used, the values of t% (s^) and '»*4 "would still be smaller than if com- 
puted from the same data ungrouped. We shall give an arithmetic illus- 
tration to show that, when the mean of each of several groups of figures 
is substituted for those figures, $ for the series is decreased ; that is, it has a 
downward bias. 

Consider the two following sets of data. The first contains nine differ- 
ent values; the second shows the mean of the first three items repeated 
three times, the mean of the second three items repeated three times, and 
the mean of the last three items repeated three times. The standard 
deviation of the nine different items is 2.58, but the standard deviation of 
the three groups of means is 2.45. 


For a de¥elopmentj see C. C. Peters and W. R. Van Voorliis, SiaitsUcai Pro- 
cedures and Their Mathemodical Bases, McGraw-Hill Book Co., Inc., New York, 1940, 
pp. 72-73 and 84-89. 
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If a distribution is so flat that the mid-value of each class closely approxi- 
mates the corresponding class mean, the value of s (and Xa and X4) based 

on those mid-values may have a 

downward bias. 

Such a situation is 


unusual. 

Sheppard^s corrections may be applied when we are dealing with a con- 
tinuous variable which, graphically, approaches the X-axis asymptoti- 
cally at both ends of the distribution. This latter characteristic is often 
referred to as ^^high contact with the X-axis.'^ If these conditions do 
not obtain, Sheppard’s corrections should not be used, as the corrections 
may over-correct. Neither is there justification for applying Sheppard’s 
corrections if the original observations have not been made with reason- 
able accuracy. 

In Table 10.4 the value of s was found to be 3.67. If s is computed 
from the ungrouped data of Table 8.1, the value obtained is 3.66. Let us 
apply Sheppard’s correction to the value of s obtained from the frequency 
distribution. From expressions previously given, it is apparent that 

Sco. = i Vttz - 0.0833, 

where s™. is the standard deviation corrected for grouping error. From 
Table 10.4 we get 


= 2.0 V3.3763 - 0.0833 == 3.62. 


Sheppard’s correction has over-corrected. 


See footnote 11 in Chapter 23. Consult also G. R, Davies and W. F. Crowder, 
Methods of Btaiisticid Analysis in ike Social Sciences^ John Wiley and Sons, New York, 
1933, pp. 81-82, and W, A. Shewhart, Mconomic Control of Quality of Manufactured 
Produetf D. Van Nostrand Co,, New York, 1931, pp. 78-79. 
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When Sheppard’s corrections are appropriate, the ^’s and a s may be 
computed from the /x’s as follows: 
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CHAPTER 11 


The Problem of Time Series 


Time series have already been seen in graphic form in Chapters 4, 5, 
and 6. The various charts of chronological data which were included in 
those chapters undertook merely to present the series, not to analyze 
them. In this and the following five chapters, we shall examine pro- 
cedures for resolving time series into their more important components. 
The statistical methods which are used for analyzing time series are quite 
distinct from, but closely related to, the methods employed in frequency 
distribution analysis. Although economists have been largely responsible 
for the development of the techniques of time series analysis, the study 
of time series is of interest to workers in many other fields, for example, 
businessmen, sociologists, biologists, geologists, public health workers, 
and others. 


MOVEMENTS IN TIME SERIES 

The time series movements which will occupy our attention are secular 
trend, periodic, cyclical, and irregular. One or two of these movements* 
may overshadow the others in some series. Ordinarily, all four of these 
movements will be present in a time series and, when present, are coexist- 
ent. We shall consider each of the four movents in turn. 

Secular trend. Over a period of a dozen or joaore years, a time series 
is very likely to show a tendency to increase or to decrease. Chart 11.1, 
which presents monthly data of deposits in New York State savings banks 
from January 1935 to December 1953, shows a pronounced upward trend. 
This series provides an interesting illustration because the trend is 
unusually predominant; virtually no other movements are discernible. 

Another series having an upward trend appears in Chart 11.2, which 
shows annual figures of sales of electric power to residential or domestic 
consumers. One of the underlying factors causing an upward trend for 
this and many other series is the growth of population, and Chart 1L2 
has been constructed with a logarithmic vertical scale in order that per 
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capita figures might also be shown. The per capita sales also show an 
upward trend which falls off only a little from the trend of total sales to 
residential and domestic consumers. Per capita sales have grown^ among 
other reasons, because of a continuing improvement in the level of living, 
which includes a wider use of electricity in the home as well as the avail- 
ability of electricity to more homes. 


BK.UOMS 
Of OOUl-AWS 
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Chart 11.1, Deposits in New York Stale Savings Banks, January 1935- 
Becemher 1953, Data from various issues and supplements of the Surmy of €ur^ 
rent Bminess. 

Other factors, too, may be responsible for the growth in a time series. 
The natural sciences have been applied to industry and to agriculture so 
as to increase their output enormously. Not always keeping pace with 
these technological changes, but induced by them, have been changes in 
business organization and methods. The growth of the corporation has 
permitted the accumulation of sufficient capital for specialzation and 
mass production. Scientific management, personnel management, and 
quality control have also played important parts in increasing the pro- 
ductivity of industry. Automation will, undoubtedly, continue to 
increase industrial productivity. Improved methods of marketing and 
better shipping facilities have made commodities available at times and 
places where they w^ere not to be had earlier.^ 
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Not all chronological series show upward trends. Some, like the crude 

death rate, shown in Chart 11.3, exhibit a downward trend. This par- 
ticular declining trend is attributable to better and more widely available 
medical knowledge and, in a large sense, reflects again a higher level of 
living. An economic series may have a downward trend because a 
better or cheaper substitute became available. Thus, synthetic fibers, 

BILLIONS Of KILOWATT HOURS 

KILOWATT HOURS LOGARltHMlC VERTICAL SCALE CAPITA 



Chart 11.2, Sales and Per Capita Sales of Electric Power to Residential 
or Domestic Consumers in the United States, 1935—1953. Data from U. S. 
Department of Commerce, Office of Business Economics, Business Statistics^ 1953, 
p. 132; Survey of Current Business, March 1954, p. S-26; and various issues of 
Current Population Reports, 

such as rayon and nylon, have partially replaced natural fibers for some 
uses, and synthetic detergents are being used in place of certain types of 
soap. More spectacular, though far beyond the memory of most of us, 
was the development of the raEroads, which forced into obsolescence most 
of the canals in this country. Now the railroads find themselves hard 
pressed by competition from trucks, buses, and airplanes. 

Improvements in the productive process are apt to be rapid at first, 
and demand may be brisk. However, as time goes on, it is often true 
that further technical and managerial improvements have less and less 
effect on output, while at the same time the market does not continue to 
expand as rapidly as before. Growth may also be retarded because of the 
increasing difficulty of obtaining raw material, such as minerals which 
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must be obtained from smaller deposits and lower-grade ores. We can- 
not undertake a complete listing of the factors^ including financial ones^ 
which often combine to slow up the growth of production in an Industry. 
Whatever the particular causes may be in a given industrjj man:- 
authorities believe that not only does relative grow’th tend to deciiiie,t bt:i 
eventually further expansion will be physical^ impossible. Raymond B. 


0EATH5 
PCft 1000 



Cliart 11. S. Crude Death Rate in the Registration Area of the United 
Slates, i90§“1953. Dal a from F. E. Linder and E, D. Grove, Vital Siaiisiics in the 
United States^ National Office of Vital Statistics, Washington, 1947, pp. 122-124; 
Staiidiical Abstract of the United StaieSf 1953, p. 61; and National Office of Vital 
Statistics, Monthbj Vital Statibiics Report, July 21, 19® and February 17, 1964. 

Prescott has characterized the tendency we have described as a of 
growth,”^ which, he says, applies to all industries. This law embraces 
four stages: (1) period of experimentation, during which the amount of 
growth is small; (2) period of growth into the social fabric; (3) period 
during which growth is retarded as a saturation point is approached ; (4) 
period of stability. Charts iL4A and 11. 4B indicate that the domestic 
consumption of rayon filament yarn behaves in this manner. F’rom the 
irst of these charts it is seen that, over the period 1912-1952, the annual 
amount of growth was initially small but gradually increased; from the 
second chart it is clear that the annual perccniage of growth has gradually 
declined. 


^ 'Taw of Growth in ForecaKtfug Demand,” by Raymoad B. Prescott. Jmmal 
of the American Staih%al Association, December 1922, Vol. XYHI, pp. 471-“47§. 
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Cbart 11.44, Domestic Consumption of Rayon Fflament Yarn, 1920-1952 and Trend as Shown i>y a Gompertg: Curve i 
Arithmetic Yertical Scale, For source of data, see Chart 13 . 10 . The Gompertz Curve has been extended to show the general 
shape of the curve. The fitting process fe described in Chapter 13 . 
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Ohart Don»e«tic C^asumption ©f Rayon Filament Yams, 1912-1952 and Trend as Shown by a Gompert* Curve 

Fitted to l>aia for l^0-~l§52s Logaritbmie Vertical Seale. For fiources of data, see Chart 13.10, The Qompertz Curve hm been 
steaded to show the paeral shape of the curve. 
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As previously suggested, sometimes the competition faced by an indus- 
try is so keen, or its source of supply so limited, that it experiences a 
transition from growth to decline. Such an industry is anthracite coal 
mining. 'The production of anthracite coal for 1880-1953 is shown in 
Chart 11.5. 

We may study the trend of a time series because we are interested in 
the trend itself, or we may wish to eliminate the trend statistically in 


MILLIONS 
OF TONS 



Chart ll.S. Production of Pennsylvania Anthracite Coal, 1880-19.5.?. Data 
from U. S. Department of Commerce, Historical Statistics of the United Staies, 1780- 
1846, p. 142; Business Statistics, 1953, p. 168; and Survey of Current Business, February 
1954, p. S-.34. 


order to throw into relief one or more other movements in the series. 
The statistical problem consists, first, of deciding the type of trend which 
will fit the data adequately and which is a logical description of the data, 
and, second, of fitting the trend of the type selected. 

Periodic movements. A 'periodic movement is one which recurs, 
with some degree of regularity, within a definite period. The most fre- 
quently studied periodic movement is that which occurs within a year and 
which is known as seasonal variation, or merely seasonal. Chart 11.6 
shows the monthly farm production of milk from January 1941 through 
December 1952. The seasonal movement in this chart is quite marked 




S941 1942 S943 i944 1945 1946 1947 1946 1949 1950 1951 1952 


Chart 11.6. Miik Frodiiction on Farms m the United States, January 
1941~Decemher 1952. Data from Bureau of Agricultural Economics, Farm Praduo 
iion^ Disposition} and Income from Milk} 1951’-! 932, Table L 

PER cewT 



Chart 11,7. Seasonal Index of Consumption of Newsprint by United 
States Ftihlishers, 1944-1952. Data of Table 14.7. 


in relation to the other movements. Notice that the seasonal variation 
of milk production is much the same from year to year. This is true, too, 
for the data of consumption of newsprint by United States publishers, the 
typical seasonal for wliich is shown in Chart 11.7, In Chapter 14 we 
shall see how to ascertain the seasonal pattern when that pattern is con- 
stant Of approximately so. However, many series show a seasonal pat- 
tern that is gradually changing with the passage of time. The amount 
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of advertising space in magazines is such a series, and we shall determine 
the seasonal pattern for data of United States magazine advertising in 
Chapter 15. 

Ohmatic conditions, including variations in rainfall, snow and ice, 
sunshine, humidity, heat, and wind, produce variations in demand which 
are often reflected in variations in production. Climatic conditions also 
directly affect production in some industries, for example, agriculture and 
outdoor construction. Although nature is primarily responsible for most 
of the seasonal variations exhibited by time series, there are other factors, 
too. The custom of giving gifts at Christmas causes a marked peak in 
"retail (especially department store) sales in December. Other such 
peaks may be expected to appear if advertisers are successful in pro- 
moting widespread gift-giving on Mother^s Day and Father’s Day. 
Peaks of retail activity before Easter and Thanksgiving are indirectly 
attributable to the seasons, since those holidays owe their origin in part 
to weather conditions. However, the urge to change the style of one’s 
clothing or automobile in the spring or fall is also partly the result of 
ostentation. 

The seasonal movement of automobile sales (and the production of 
automobiles and parts as well) is not only due to climatic changes but 
is also the result of certain man-made decisions. In 1935, in an attempt 
to spur a sluggish economy, the automobile show, which would normally 
have been held in January 1936, was moved ahead to November 1935. 
With new models being brought out several months earlier than pre- 
viously, there was, of course, a sudden shift in the seasonal pattern. New 
models of the various makes of cars are not now introduced at exactly the 
same time, but nearly all appear within a month or two of each other. 
The introduction of new models, particularly if they embody style or 
mechanical changes, continues to have a pronounced effect on the sea- 
sonal movement of automobile sales. 

We may be interested in seasonal variation either because wish 
statistically to eliminate seasonal from a time series or because we are 
interested in the seasonal movement itself. In Chapter 16 attention will 
be given to deseasonalizing time series data for the purpose of making 
the other movements (particularly cyclical) more readily discernible. 

Interest in the seasonal movement itself may have any one of several 
objectives. First, it may be that we wish to ®^iron out” the seasonal so 
that the intra-year fluctuation will be less pronounced. Thus, attempts 
were made to build up the winter demand for ice cream by advertising: 
”Ice cream is one of your best foods. Eat a plate a day.” On the pro- 
duction side, hens have been stimulated to lay in the off (winter) season 
by increasing the length of their day with artificial lighu. 



Chap. 11] THE PROBLEM OF TIME SERIES 249 

Secondy a manufacturing establishment may wish to decrease the 
seasonal nature of its activities by producing commodities with comple- 
mentary seasonals. Thus, one concern makes sleds and garden culti- 
vators. On a much larger scale is the objective of an under-water cable 
from Britain to France to link the electric powder systems of these two 
countries. A large proportion of French electrical power comes from 
hydroelectric plants that suffer from water shortages in the late summer 
when Britain's coal-burning generators are working below capacity. On 
the other hand, during most of the winter, when Britain's generators are 
overloaded, France has surplus water to operate its hydroelectric plants. 

Third, one may be interested in a seasonal movement in order to take 
advantage of it. Thus, the housewife tries to buy fruit for canning or 
preserving at the peak of the season when the price is low and when 
quality may be high. 

Although we shall not attempt to deal with them in this book, there are 
also periodic movements w^hich may be characterized as intra-month, 
intra-week, and intra-day. As an example of an intra-month movement, 
consider a commercial bank which may show peak activity around the 
first and fifteenth of each month. If the bank is in an area where weekly 
factory payrolls must be prepared, its business may show a characteristic 
intra-week movement, too, which will 6,epmd upon the day (or days) of 
the week on which the factories pay their employees. When monthly 
and weekly peaks coincide, the staiff of the bank may indeed be busy. 
An interesting intra-week periodic is that observed by Sears Roebuck 
and Company in regard to the number of cash sales per pound of mail.® 
During a normal w’eek the figures are: Monday 30, Tuesday 37, Wednes- 
day 35, Thursday 32, and Friday 31. The business of a restaurant 
supplies an illustration of an intra-day movement. With three peaks 
each weekday, the manager must plan ahead and have enough food and 
enough help for these relatively short; busy times. The power cable 
from Britain to France, which was just mentioned, will help to dove- 
tail dissimilar intra-day demands for ele<‘4ricity in the two countries. 
Although no one has yet devised an efficient method of storing power, as 
such, it is possible to accumulate water behind a dam. If, during the dry 
season or any other time of the 3 ^ear when the dams are not full, France 
uses British power any time during a 24-hour period, some French water 
is being stored behind French dams to help either country meet peak-load 
demandH. 

Cyclical niovemeiits* Cyclical movements are fluctuations which 
differ from periodic movements in that they are of longer duration than 

® See Estimating Bally Order Receipts from ivelght of Mai/^ by C, W, Smalley, 
The American StatiBtkmnf Febniary 1954, pp, 14-15. 
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a year and also in that they do not ordinarily exhibit regular periodicity. 
Business cycles are not random movements because the position of busi- 
ness at a given point in a cycle is affected by the activity in previous 
months and, in turn, affects business in the immediate future. In otliei 
words, the transition from a low point to a high point, or vice versa, is a 
progressive development. Cycles appear to operate somewhat on the 
principle of a pendulum. Just as a pendulum is pulled by gravity toward 
a vertical position, but tends constantly to move past its position of 
equilibrium, so it is said that business is drawn toward an equilibrium by 
the forces of dem*uid and supply, and so also do the errors in one direction 
tend to progress into errors in the opposite direction. Sucji an explana- 
tion of business c^^cles is known as the ‘‘self-generative theory,^’ usually 
associated with the name of Wesley C. Mitchell. But just as the mecha- 
nism impelling a pendulum must be wound up occasionally, so it is 
possible that economic activity would attain equilibrium were it not for 
other propulsions of varying degrees of intensity. It is possible to speak 
of cycles in general business or of cycles in particular industries, such as 
residential construction, cattle raising, or textile production. Rarely, 
cycles in a specific industry or business may appear to be inherently 
periodic, but they are, in any event, modified by the position of the cycle 
in general business. Furthermore, since all industries are so inter- 
dependent, a revival or recession in a key industry or industry group soon 
transmits its effect to other branches of activity. 

It appears that cyclical movements of general activity may be gener- 
ated by a concurrence of the same cyclical phase in the activity of several 
important industries; or they might be generated by interferences from 
outside the business world. These interferences might be occasional 
events of considerable magnitude, such as a war, a discovery, unusual 
weather, or some political event; or they might be the simultaneous 
occurrence of several minor events, each reinforcing the effect of the other. 

When cycles appear to have a rough regularity, this regularity may 
possibly be explained by the periodicity of certain of the extraneous 
events which, some authorities believe, are in part responsible. Cycles 
in weather have been suggested. It is more likely, however, that what 
regularity can be observed is due to the fairly constant length of time it 
takes the business world to respond to stimuli. For instance, the time 
'it' takes for erecting" a building. or for foreclosing a mortgage, or even to 
decide to go into bankruptcy, is not utterly irregular. Perhaps greater 
would be' observable weredt not for the irregularity of accL 

There am who mimi the cone^t of sett-generating', cycles, 

hel'Bi/log about Mriply byv'wiernal infipences. 



Chap. 11 ] 


THE PROBLEM OF TIME SERIES 


251 


Even these observers, however, are interested in noting whether produe- 
tion and consumption are increasing or decreasing, and especially in 
discovering practical measures for stabilization. Whether self-generated 
or caused by external factors, it is clear, from Chart ILIO, that there 
have been cyclical fluctuations in United States magazine advertisiiig, and 
that the cycles have not been of the same length. Chari 11.10 also 
illustrates a difficulty frequently encountered in the study of time series. 
It has to do with the decision concerning what is a cycle. Does the curve 
of Chart 11.10 »show about two large cycles or several smaller ones? A 
decision may be influenced by the trend used for the series. As will be 
seen later, the trend employed was a straight line fitted to the years" 
1915-4949 and extended through 1953. Had we (*oncenied ourselves 
with a shorter period of time, for e.^ample, 1933-1953, and made use of a 
trend for only those years, two cycles would have appeared for the 
twenty-one-year period. 

Irregular variations* The irregular variations in a lime series are 
sometimes divided into two categories: epfsodm and mcidentaL When 
episodic movements occur in a time series, they may be readily identifiable 
in the chart of the series if they are due to specific events, such as earth- 
quakes, conflagrations, strikes, early or late melting of ice on the Great 
Lakes, severe storms, or other occurrences. 

The unadjusted data of magazine advertising, shown in Chart 11,8, 
would reveal a number of episodic movements to one who is familiar with 
that field of activity. For example, shortly after the end of World War 
II, there was an increase in the amount of magazine advertising space 
used, which resulted in a less-than-seasonal decline in December 1945, 
This appears as a sharp peak in the curve of the data adjusted for sea- 
sonal movements, which is also shown in Chart 11.8. An episodic move- 
ment which was important enough to be reflected in annual data appears 
in Chart 1L3. The very 'high death rate in 1918 was the rtisuit of an 
epidemic of influenza which caused many deaths among civilian and 
military personnel 

As mentioned before, an episode may he important enough to generate, 
or assist in generating,, a cyclical fluctuation. Occasionally it may be 
difficult to distinguish between an episodic movement and a cycle. 

Accidental movements are minor fluctuations not attributable to 
specific episodes and too small to merit individual consideration. These 
accidental fluctuations may sometimes be of a random nature. The 
irregular \^ariations (accidental and episodic combined) for United States 
magazine advertising are shown in Charts 16.7 and 16.8. 

Other movements* The four movements which have been men- 
tioned are the most prominent ones ordinarily found in time series. 
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Sometimes investigators find “long cycles/^ wMch are of much longer 
duration than the usual business cycle and which may last roughly 50 
years. Both types of cycles may be present simultaneously and super- 
imposed on each other. Occasionally, students of time series claim the 
existence of more than two cyclical components in a time series. Inter- 
mediate between the long cycle and the business C3^cle, a movement 
called “secondary trend is sometimes found. In this text we shall give 
no further attention to long cycles or secondary trends® but shall concen- 
trate our attention on the four movements first mentioned. 

A GRAPHIC PREVIEW 

The nature of the four leading movements in a time series may be 
understood more clearly if we look at some of the charts of data of United 
States magazine advertising, which will be considered in more detail 



Chart 11.9. Seasonal Movements of Magaacine Advertising In the United 
States, 1921-1953. For soarces of data, see note to Table 16.3. 


later. The lighter broken line of Chart 11.8 shows the original data in 
terms of thousands of agate lines. This curve includes all of the move- 
ments: trend, seasonal, cyclical, and irregular. Chart 11.9 shows the 
seasonal variation present in the series, and the solid line of Chart 11.8 
shows the appearance of the data after they have been adjusted for 
seasonal variation. The cyclical movements are indicated in Chart 
IHO. No chart of the irregular movements is included here, but, as 
noted before, they may be seen in Charts 16.7 and 16.8. 

PRELIMINARY TREATMENT OF DATA 

Some variations in time series are due to the terms in which the data 
are expressed, and at times it may be advisable to make certain adjust- 
ments before undertaking to analyze a time series. 

Calendar variation. Usually, though not abvays, there are 365 days 
in a year. Although there are 12 months in each year, the months vary 


* For a discussion o! these movemeatSj see R. A. Gordon, Business 
Harper and Brothers. New York, 1952, pp. 201-299, 





Ciia^t 11,10* Cyclical Movements of Magazine Advertising in the United States, 1921—1953, Data from Table 16.5 ami from worksheets 

(not shown) for the years omitted from that table. 
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in length from 28 to 31 days. To make matters more complicated, the 
different months do not start on the same da}" of the week, nor does the 
same month in successive years so start. Another difficulty Jias to do 
with the number of working days in a month. Xot only do the iiurnber 
of Saturdays and Sundays vary between months, but February, with 28 
or 29 days, has Washington's Birthday and Lincoln’s Birthday, while 
March, with 31 days, may include no ho}ida 3 ’'s. Pebruarj^ may include 
as few as 18 working days, while March may have as many as 23. The 
fluctuation of Easter between March and April also introduces an element 
of confusion. 

Although it seems impossible to divide the year into quarters contain- 
ing the same number of whole -weeks, nevertheless some business firms 
have tried to minimize the difficulty. A few*- firms keep records by 4-week 
periods. There are 13 such periods in a year, but quarterly data cannot 
be kept by this system. A few other firms keep records by quarters, each 
quarter being composed of three months — the first twm months of four 
weeks each and the third of five W’-eeks, Of course, neither of these plans 
is satisfactory so long as the first of a given calendar month may occur in 
either of two artificial months. And under any plan, the unsystematic 
occurrence of holidays results in a different number of tvorking da 3 ’"s in 
successive artificial months. Movements have been launched to change 
the calendar to remedy these defects. One plan suggests identical 
quarters; each quarter w^ould contain, not identical months, but three 
monthly patterns of thirty or thirty-one days each, these three patterns 
being repeated so as to occur four times a year. An extra dajq however, 
known as Year Day, would occur at the middle of the year. 

The statistician is sometimes confronted with the problem of adjusting 
a time series for either the number of calendar days in a month or for the 
number of working days in a month. If monthty data of the residential 
consumption of water are to be adjusted for calendar variation, the 
appropriate adjustment would doubtless be on the basis of calendar days 
rather than working days. This adjustment is accomplished by dividing 
each monthly figure by the number of days in the month, giving consump- 
tion per da 3 ^ If it i*s desired to retain the figures in their original magni- 
tude, the consumption per day may be multiplied by the average number 
of da^'s per month, which is 305 12 = 30.4167 for a 365-day year. 

For monthly production data, the adjustment for calendar variation 
would involve consideration of the number of working days rather than 
calendar days in each month. To adjust for the number of working 
days, the following procedure may be followed: 

(i) Ascertain the holidays observed by the industry. These will 
differ in different industries and in different localities. 
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(2) Count the number of holidays observed in each month of each year. 

(3) Count the number of Sundays in each month of each year, if 
Sunday is not a working day. 

(4) Count the number of Saturdays in each month of each year, if 
Saturday is not a working day. If Saturday is a half-holiday, the 
count should be halved. 

(5) For each month, add the counts obtained in (2), (3), and (4). The 
resulting figure is the total number of non-working days, including 
allowance for an extra holiday if a regular holiday occurred on 
Saturda}^ or Sunday. If no such extra holidays are observed, 
appropriate subtractions should be made when a holiday occurs on 
a Saturday or a Sunday. 

(6) Obtain the number of working days for each month by subtracting 
the figure obtained in (5) from the number of calendar days. 

(7) Divide the original data for each month by the number of working 
days the month to obtain production per working day. The 
data may be restored to their original magnitude by multiplying 
the production per working day by the average number of working 
days per month for the year under consideration. This average 
may vary slightly from year to year. 

It would be entirely inappropriate to adjust some time series foi 
calendar variation. Clearly it would be spurious to do so for executive, 
administrative, and supervisory salary expenses of most corporations, 
since such salaries are usually paid on a monthly basis irrespective of the 
number of days or working days in a month. For data requiring adjust- 
ment, it is frequently a difficult statistical problem to decide whether to 
adjust for working days or merely for calendar days. For some com- 
modities it can logically be maintained that holidays within a month, far 
from decreasing consumer purchases during that month, may actually 
increase them. If the holiday occurs on the last day of the month and 
the stores are closed, however, it might decrease sales. In organizations 
which receive orders through the mail from a considerable distance, sales 
may be decreased by holidays occurring during the last few days of the 
preceding month. Just what is the logical adjustment to make is often 
very difficult to determine and requires familiarity with the business or 
industry in question. In case of doubt it is always possible to determine 
experimentally what method gives the smoothest results after the adjust- 
ment is made* Such a test provides no conclusive evidence but is only 
presumptive. Sometimes a separate adjustment should be made for 
Easter, as explained in Chapter 15. 

Population, chauges. It has already been noted that one element 
in an upward trend may be'^the increase in population. Data may be 
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adjusted for population change by dividing the original figures by the 
population figures, thus expressing the data on a per capita basis. This 
is what was done in Chart 11.2. Alternatit^ely, the population figures 
may be put in relative terms with the population for a selects census 
year, say 1920, set equal to 1.00, or 100 per cent. If the original data 
are then divided by the population relatives, the resulting figures will be 
in terms of a fixed (1920) population. 

Price changes. Interest often centers in physical volume changes 
rather than changes which have occurred in terms of dollars. Series such 
as sales, earnings, cost of materials, and others which are originally 
expressed in dollars must be deflated in order to be expressed in terms 
which are independent of price changes. Deflation is accomplished by 
dividing the dollar series by an appropriate price index series. Table 
11.1 shows the average hourly wages paid to employees of Class I railways 
in each year from 1947 to 1952. To the right of the column of hourly 
wages is given the Consumers^ Price Index for the same years. ‘N'ow, if 
hourly wages in dollars for each year are divided by the corresponding 
price index (expressed as a decimal), the result is a series of hourly wage 

TABLE 11.1 


Average Hourly Wages of Employees of Class I RaUteays 
and Consumers^ Price Index 1947-1952 


Year 

(1) 

! Hourly 
wages 

I (2) 

1 Price index 

' (1947-49 = 100) 
(3) 

* Hourly real wages 
! [Col (2) Col. (3)1 
i (4) 

1947 

. 11.204 

i 95.5 

' 

$1.26 

1948 

i 1 345 

102 8 

i 

1.31 

1949 

1.464 

i 101.8 

! 

1 44 

1950 

' 1 596 

: 102.8 


1.55 

1951 

1 1 770 

i 111.0 


1.59 

1952 

! 1 872 

! 113.5 


1.65 


Data from Eastern Railroad Presidents Conference, A Yearbaak of Railroad 
Information, edition, p. 74, and Monthly Labor Revifu\ September 1853, 
p. 1034. 


figures adjusted for changes in prices. These are shown in Column (4) 
and are referred to as real wages or, specifically, wages in terms of 1947- 
1949 dollars. Chart 11.11 shows curves of hourly dollar wages and 
hourly real wages. Eith though prices rose during 1947-1952, hourly 
real wages show’'ed a steady increase. Note that the figures show^n in 
Table ILl and Chart 11.11 have to do with average hourly wages. To 
ascertain if the railroad employees* purchasing power increased or 
decreased over the period, we would have to consider the hours worked 
during each year. It happens that the hours worked decreased slightly 
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from 1947 to 1952, but the real annual wages for employees of Class I 
railways nevertheless showed a steady rise. 

In Table 11.1 the Consumers' Price Index was used as a deflator. An 
index of wholesale commodity prices, for example, would have been 
entirely unsuitable. Unless a deflator is used that pertains to the data 
being deflated, a satisfactory adjustment for price changes will not be 
obtained. 

OOtLARS 

PER HOUR 



Chart 11.11. Average Hourly Wages and Average Real Hourly Wages 
of Employees of Oass I Railways, 1947-1952. Data of Table 11.1. Real 
wages are in terms of the Consumers Price Index, which has 1947-1949 = 100. 

Securing comparability. Statisticians for trade associations experi- 
ence considerable difficulty in obtaining prompt reports from all members. 
For instance, 93 firms might report on time one month and 96 the next— 
the latter not necessarily, however, including all the 93 firms. To be 
strictly accurate, a new time series should be constructed each month 
for the entire period including all of, and only, those firms which reported 
promptly for the month in question. Thus, a complete time series one 
month would be computed for the 93 firms, and the next month for 96. 
This is a very laborious procedure. An easier procedure is to make a 
preliminary estimate by computing the percentage of the preceding 
period for only those firms which reported promptly for the current 
month, and to multiply the figure for the preceding month (which 
now includes all firms) by this percentage. A revised figure can be com- 
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puted wiieii all the reports have been obtained. If an industry is expand- 
ing and new firms are appearing, it is, of course, desirable to include them. 
Increased employment and production may result from increased activity 
of existing firms or the appearance of new ones. Similarly, firtns may 
cease to exist and must be dropped from a reporting list. 

Another source of incomparability may be the fact that the unit of 
reporting has changed. If it is merely a question of changing from a 
pound basis to a ton basis, this is a simple matter. Where the product 
has changed in kind, however, it is difficult to find a satisfactory?^ solution. 
How, for instance, can we compare the physical production of radios* 
between 1935 and 1953? Not only was there a difference in the proper- * 
tion of radios of different grades sold in the two years, but radios that were 
the same with respect to price, weight, number of tubes, or any other 
readily measurable characteristic were still vastly different in their 
capacity to render utility to the consumer. 



Symbols Used in Chapter 12 

a: a constant in the equation Fc = a + bX; the value of Yc when X = 0; 
the Y intercept. 

b: a constant in the equation F^ = a + &X; the slope. 

N: the number of items in a series. 

S: upper-case Greek sigma, meaning ^Hake the sum of.^^ 

X: a value of the X series. 

A""!, X 2 , Xs, • • • , Xn*" specific values of the X series. 

X: the arithmetic mean of the X values. 

F : an observed value of the F series. 

Yci a computed value of the F series. 

Fi, F- 2 , F 3 , ‘ , Fjv-: specific values of the F series, 

F: the arithmetic mean of the F values. 


2m 



CHAPTER 12 


Analysis of Time Series: 

SECULAR TREND I- THE STRAIGHT LINE 


There are two important reasons for attempting to describe the trend 
of a series by means of a curve. First, it may be desired to measure the 
deviations from the trend. These de\dations consist of cyclical, seasonal, 
and irregular movements. Frequently, obtaining these deviations is but 
one step in attempting to isolate cycles in order to study them. Second, 
it may be desired to study the trend itself, in order to note the effect of 
factors bearing on the trend, to compare one trend with another, to 
discover what effect trend movements have on cyclical fluctuations, or to 
attempt to forecast the future behavior of the trend. 

The purpose for which measurements are made partl}^ determines the 
methods adopted. If the object is solel}” to isolate cycles, it seems 
reasonable to suppose that the trend line chosen should pass through the 
cycles in such a way as approximately to allow a balancing between tlie 
positive and negative phases of each cycle. Whether a curve is deemed 
to have accomplished this object depends, of course, upon our conceptioii 
of what constitutes a cycle in each case. If, on the other hand, the object 
is to make comparisons, generalizations, or forecasts, the curve should be 
not only logical, but also of such a nature that it can readily be expressed 
by a mathematical formula. By means of such a formula a person can, 
for instance, say that at a given time a series shows a certain latio, or a 
certain amount, of growth per annum, and that, if this tendency con- 
tinues, the trend will reach a certain value at some specified time in the 
future. Fitting a trend by a mathematical formula does not, however, 
remove the subjective element from trend fitting. The statistician can 
vary the behavior of the curve by selection of the type of formula he 
employs, or the years to which he fits the curve. It remains true, there- 
fore, that the statistician decides in advance, nfo% m objective and logical 
a bams as possible^ what he thinks the trend ought to look like, and then 
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selects the mathematical method that will closely approximate this 
result. 

TREND FITTED BY INSPECTION 

The simplest method of describing a trend graphically is by inspection. 
If the trend is a straight line, it may be drawn with the aid of a trans- 
parent ruler or a tightly stretched piece of string. If the trend is non- 
linear, it may be drawn freehand or use may be made of a spline, an 
, adjustable curve niler, or a French curve. ‘ 



Chart 12.1. Magazine Advertising in the United States, 1915-1953, and 
Straight-Line Trend Fitted by Inspection to the Years 1915-1949. Advertis- 
ing-lineage data from Table 12.2. See notes following the title of Chart 12.3. 

Chart 12.1 shows a fit of a straight-line trend, by inspection, to maga- 
jgine advertising in the United States for 1915-1949. Whenever a curve is 
fitted to a set of data, a criterion of jit is involved. The trend of Chart 
12.1 was drawn through the curve in such a manner that cyclical portions 
above and below the trend line were judged, by inspection, to be about 
equal. The trend line also passes through the approximate average 
(determined by inspection) of the advertising lineage data at the middle 
year, 1932. This highly subjective method is open to the objection that 
may be made to all subjective methods; one determines what answer he 
wants and then proceeds to determine it. However, as has already been 
mentioned, very nearly the same result may be obtained by careful selec- 
tion from among the numerous available mathematical procedures. 

^ The^e three devices are available from firms selling artists’^and draftsmen's supplies. 
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LEAST-SQUARES FIT OF STRAIGHT LINE 

A mathematical equation not only allows us to draw the trend of a 
time series but provides, also, in the trend equation, a concise *definition 
of that trend. If the trend itself is to be studied, or is to be extended 
beyond the observed data, it is particularly desirable that the trend be 
described by an objectively determined equation. 

The straight line. The simplest type of curve is the straight line, 
which is described by an equation of the type Yc - a + bX, in which X 
is the independent variable and Yc the trend value of the dependent 
variable.^ Since their values must be determined for each of the series" 
being analyzed, a and b are referred to as unknowns. They are also called 
constants^ since, once their values are determined, they do not change. 

To take the simplest case, suppose that a = 0 and 6-1. The equa- 
tion then becomes: Fc = X; and this means that with each increase of 
one unit of the independent variable, the dependent variable also increases 
one unit. This equation is plotted in the upper left section of Chart 1 2.2. 
Incidentally, it should be observed that all four quadrants are shown in 
this chart. Before attempting to plot a curve, it is well to draw* up a 
table of X and Yc values, as shown on the chart, in which are recorded the 
computed values of Y that correspond to selected values of X. As a 
matter of fact, only two points are needed to plot this or any straight line, 
and most accurate results are obtained by using two X values a consider- 
able distance from each other. 

Other straight-line equations and their curves are shown in the other 
sections of Chart 12.2, an inspection of w’-Mch yields the following informa- 
tion: a is the value of F when X is 0 (the F value at the X origin), or, as 
it is frequently termed, the F intercept; w’'hile h indicates the steepness, 
or slope y of the line. When 6 is positive, the slope is upward; when h 
is negative, the slope is downward. 

Although the straight-line trend of Chart 12.1 was obtained by inspec- 
tion and not mathematically fitted to the data, we can nevertheless 
determine its approximate equation. If the origin be taken at 1915, it 
will be seen that the curve has a Yc value of 20, so a - 20. To determine 
6, we merely need to ascertain the value of the trend for 1949, which is 
43, take the difference between that value and the trend value for 1916, 
and divide by the number of elapsed years, 34. This gives 


43 - 20 
34 


- 0 , 68 , 


^The symbol F will be used to designate an observed value of the dependeat 
variable, while F,. indicates a value that has been computed, usually from a mathc- 
matical equation. 




Chart 12,2, Straeight-Cme Ktiuatians mini Curves, 
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which is the value of the amount of increase in the trend each year. 
The equatioUj then, is 


F. - 20 + 0.68X, 

Origin, 1915. X units, one year. 

Trend equations for time series must always be accompanied by a state- 
ment concerning the origin and the X units. We must speeifj’^ the X 
units, since, as we shall see later, the 3 ^ may be one year, one-half year, ot 
one month. The origin must be indicated because series of data by years, 
months, or other chronological units do not have a zero useful for fitting 
purposes. Consequently, the statistician can select the X-origin where 
he pleases, and we shall see later that it will be advantageous to choose 
that origin at the middle of the chronological series whenever possible. 

If we rewrite the equation for the trend of Chart 12,1, with 1932 as the 
origin, we have 


= 31.6 + 0.68X. 

Origin, 1932. X units, one year. 

Note that the value of b is the same as before. The new a value may be 
obtained either by reading the trend value for 1932 or by adding 17 times 
the b value to the former a value. The value of b is multiplied by 17 
because 1932 is 17 years removed from 1915. 

Method of least squares. The method of least squares provides a 
convenient device for obtaining an objective fit of a straight-line trend 
line to a series of data. It can also be applied to a number of more com- 
plex trend types, some of which will be discussed in Chapter 13, The 
method of least squares accomplishes two objectives: 

L The sum of the vertical deviations of the observed values from the fitted 
straight line equals zero. If a vertical line were to be drawn, in Chart 
12.3, from each F value for 1915-1949 to the trend line, the vertical lines 
extending upward from the trend line would exactly balance those extend- 
ing downward. This trend is not the onfy straight line from which the 
algebraic sum of the deviations equals zero; as a matter of fact, any 
straight line (other than vertical) which passes through F fulfills this 
requirement. 

2. The sum of (he squares of all these deviations is less than the mm of the 
squared mriical deviations from any other straight line. It is because of 
this second characteristic that the method of fitting is called the method 
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of least squares.® When a curve is fitted to meet this second requirement, 
the first requirement is automatically satisfied.^ 

MitLtOMS OF 
*(JATE UIMES 

eo 


so 


40 


so 


ao 


10 


l»ia '»7 'iO *21 *23 'as ’27 *29 *31 *33 *35 *37 *39 *41 *43 *45 *47 »49 'SI 1953 

Clbart 12.3. Magazine Advertising in the United States, 1915-1953 and 
Trend as Shown by a Straight Line Fitted by the Method of Least Squares to 
the Years 1915-1949. Data of Table 12.2, Note that inclusion of the first four 
years had little effect on the trend because of the length of the period covered. See 
page 279, Note also that two trends, one for the first part of the series andsone for 
the latter part (see page 278), might have been used. 

In a sense, a trend line fitted by the method of least squares is analogous 
to the arithmetic mean, since the arithmetic mean is a single value^ rather 
than a series of values, summarizing a set of data and possessing the two 
characteristics just mentioned. 

® It can be demonstrated that the greatest probability of obtaining deviations which 
are distributed normally (see Chapter 23) around .some computed value or series of 
values is obtained when the sum of the squared deviations is at a minimum (see 
Appendix S, Section 12. 1) . If it is believed that deviations from the appropriate norm 
are chance errors, it follows that the method of least squares is the appropriate method 
of fitting. The method is also convenient algebraically, as the student can observe in 
connection with correlation analysis and analysis of variance. Time series fluctu- 
ations around a trend line are not, however, independent accidental occurrences, 
and it is to be doubted that there is any special reason for using the method of least 
squares in trend fitting, other than its convenience. Certain of the trends explained 
in this volume are, in fact, fitted by other methods. Some statisticians even argue 
that the least-squares criterion is not appropriate for time series trends, since time 
series are sometimes characterized by extreme deviations not in accordance with 
the normal law. The method of least squares, of course, is particularly influenced by 
extreme deviations because of the squaring process. 

The mean of the Yc values is the same as the mean of the Y values. This is demon- 
stmted in Appendix S, Section 19.1. Before reading that explanation, however, the 
leader should peruse the next sectton of this chapter. 




Chajp. 12] 


THE STRAIGHT LINE 


267 


The normal equations* It has already been noted that the equation 
for a straight line involves the two constants a and b. For a fitted 
straight line, the values of a and b must be determined from the observed 
data; consequently, two normal equations must be obtained and solved 
simultaneously. These normal equations are: 

L SF - ATa + 6SX, 

11. SXF =* aSX + 6SXI 

Without attempting a derivation® of these normal equations at this 
point, we shall make use of a set of simple illustrative data to see how thejse 

TABLE 12.1 


Determination of Normal Equations and of Sums for Fit of Straight Line^ 
by Method of Least Squares^ to Illustrative Data^ X and Y 


X 

(1) 

Y 

(2) 

Observa- 

tion 

equation 

Y=^a+bX 

(3) 

Determination of first 
normal equation 

Determination of second 
normal equation 

[ 

xr 

(8) 

(9) 

Coefficient 
of a 

(4) 

Observation 
equation 
multiplied by 
coefficient of a 
Col. (3) X 
Col. (4) 

(5) 

Coefficient 
of 6 

(6) 

Observation 
equation 
multiplied by 
coefficient of b 
Col. (3) X 
Col. (6) 

(7) 

0 

2 

2 - a 

1 

2 a 

0 


; 0 

0 

1 

1 

1 * a 4- 6 

1 

X ^ d h 

1 " 

1 =* Ct ^ 

1 

1 

2 

3 

3 = a ■*{“ 26 

1 

3 =» a -f- 26 

2 

6 = 2o + 46 

6 

4 

3 

2 

2 - a -f 36 

1 ^ 

2 = “1“ 36 

3 

6 — 3fl “f" 96 

6 

I 9 

4 

4 

4 - a 4- 46i 

1 

4 = 0 + 46 

4 

16 = 4o + 166 

16 

1 16 

5 

3 

3 - a 4- 56; 

1 

3 = a + 56 

5 

15 = 5u + 256 

15 

26 

6 1 

5 

5 a 4* 65 

1 

5 = a + 66 

6 

30 = 6a + 366 

30 

1 36 

21 * 

20 



20 « 7a 4- 216 


74 = 21a + 916 

74 

! 91 


two equations are arrived at. The data are shown in Columns 1 and 2 of 
Table 12.1/ and in Chart 12.4, where it may be seen that there are seven 
pairs of X, F values. We shall therefore first write down seven observa- 
tion equations^ from which %ve shall obtain the two normal equations. 
Column 3 of Table 12. J shows the seven observation equations. Since 
the observed data do not fall on a straight line, the seven observation 
equations are not all consistent with each other. It is the purpose of the 
two normal equations to enable us to arrive at a sort of average solution 
of these observation equations. 

The first normal equation is obtained by multiplying each observation 
equation by the coefficient of a in that equation and adding. The coeffi- 
cients of a, which are 1, are shown in Column 4 of Table 12.1 * Column 


® For a derivation of the two normal equations, see Appendix S, section 12,2. 
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5 shows the observation equations again (unchangedj since the coefficients 
of a were all 1) and their sum, which is the first normal equation. 

To get th^ second normal equation, each observation equation is multi- 
plied by the coefficient of b in that equation and the sum obtained. The 
coefficients of b are shown in Column 6 of Table 12.1 and the results of the 
multiplications are given in Column 7. The total of Column 7 is the 
second normal equation. 



0 1 2 3 4 5 6 

X VALUES 

Cliart 12* .4# A Straight Line, Fitted by the Method of Least Squares, 
to a Set of Illustrative Values. Data of Table 12.1. 


The two normal equations may now be set down : 

I. 20 - 7a + 216. 

II. 74 21a + 916. 

To solve these simultaneously, we multiply normal equation I by 3 and 
subtract it from normal equation II, thus eliminating a and obtaining one 
equation with one unknown, 6: 
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II. 74 = 21o + 916, 

(I X 3). 60 = 21a + 636, 

14 = 286, 

6 = 0.5. 

To get the value of a, we substitute the value of 6 in either normal equa- 
tion I or II. Using normal equation I; 

20 = 7a -I- 21(0.5), 

= 7a -f- 10.5. 

7a = 9.5, 
a = 1.357. 

As a check, the values of a and 6 may be substituted in normal equation 
II, as follows: 

74 = 21(1.357) -f- 91(0.5), 

= 28.5 + 45.5, 

= 74.0. 

The equation of the fitted straight line (which is shown on Chart 12.4) 
may now be written : 

Yc = 1.36-1- 0.5Z. 

Notice that it was not necessary, in this case, to state the origin or the X 
units, since the X values were not dates. 

The foregoing illustration was a specific instance involving but seven 
pairs of values. To be more general, let us write the observation equa- 
tions for N pairs of values as follows: 

Fi = a + 6 Zi, 

Fj = a -f 6 X 2 , 

Fs = o -h bX,, 


Fat = a -f- bXif, 

If, now, we multiply each of these observation equations by the coefficient 
of a (which is 1), they are unchanged and their sum is 


1. SF = Xa -h 6SZ. 
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This is the first normal equation. To get the second normal equation, we 
multiply each observation equation by the coefficient of b in that equation 
and add, obtaining: 



= aXi 

+ 

bX\, 


= aX2 

+ 

hXl, 

-Y3F3 

= aXs 

+ 

bXl, 


== aX}^ 

+ 

bXl 


= aSX 

+ 

bSX®. 


Note that we write aSZ and 5SZ-, rather than SaX and SbX^, because 
a and b are constants. 

We are now in a position to use the two normal equations to determine 
a straight-line trend. We shall not find it necessary to set up any more 
observation equations; only the normal equations will be needed. For 
the illustrative data of Table 12.1, only the sums of Columns 1,2, 8, and 9 
and the value of N ai'e used, giving, for the two normal equations: 

I. 20 - 7a + 216, 

IL 74 = 21a + 916, 

which is the same as the two equations shown at the bottom of Columns 5 
and 7 of the table. 

We shall make use of two, or more, normal equations not only to fit 
trend lines by the method of least squares in this chapter and in Chapter 
13, but we shall also employ them in Chapters 19, 20, and 21 when dealing 
with linear, non-linear, and multiple correlation and in Chapter 22, as 
well, where we correlate time series. 

Odd number of years. The data of Table 12.2 and the solid curve of 
Chart 12.3 show the amount of advertising in magazines in the United 
States in millions of agate lines for 1915“-1953. We shall fit a straight 
line to the data for 1915"-! 949 and extend that trend line through 1953. 
The two normal equations 

L . ‘SF - Xa + 6SX, 

IL SXF = aSX + 6SX2, 

will be used to determine the values of a and 6 for the straight-line trend. 
However, it is possible to simplify them in such a manner that simul- 
taneous solution of the two equations will not be necessary. Owing to 
the fact that years constitute the X variable, we must select an origin for 
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TABLE 12.2 

Computation of Values for Fit of Straight Line to Bata of Magazine 
Advertising in the United States 1915-1949 


(Alilliona of agate lines) 


Year 

X 

Y 

XY 

Trend values 

1915 

-17 

16.9 

-287.3 

21.2 

1916 

-16 

20.0 

-320.0 

21.8 

1917 

-15 

21.3 

-319.5 

22.4 

1918 

-14 

18.6 

-260.4 

22.9 

1919 

-13 

25.7 

-334.1 

23.5 

1920 

-12 

33.6 

-403.2 

24.1 

1921 

-11 

22.3 

-245.3 

24.7 

1922 

-10 

24.4 

-244.0 

25.3 

1923 

- 9 

30.2 

-271.8 

25.9 

1924 

- 8 

31.4 

-251.2 

26.5 

1925 

- 7 

31.5 

-220.5 

27.1 

1926 

- 6 

35,5 

-213.0 

27.7 

1927 

- 5 

36.5 

-182.5 

28.2 

1928 

- 4 

36.4 

-145,6 

28.8 

1929 

- 3 

40.6 

-121.8 

29.4 

1930 

- 2 

35.8 

- 71.6 

! 30.0 

1931 

- 1 

28.9 

- 28.9 -3,920.7 

i 30.6 

1932 

0 

21.2 

0 

31,2 

1933 

1 

18.7 

18.7 

31.8 

1934 1 

2 

24.3 

48.6 

32 4 

1935 i 

3 

25.4 

76.2 

33.0 

1936 i 

4 

28.5 

114.0 

33.6 

1937 

5 

32.1 

160.5 

34.2 

1938 

6 

25.4 

152.4 

34.7 

1939 

7 

25.6 

179.2 

35.3 

1940 

8 

26.9 

' 215.2 

35.9 

1941 

9 

27.7 

249.3 

36 5 

1942 

10 

25.7 

257.0 

37.1 

1943 

11 

33.1 

364.1 

37.7 

1944 

12 ! 

42.0 

504.0 . 

38.3 

1945 

13 ^ 

49.0 

637.0 

38.9 

1946 

14 

54.8 

767.2 

39,5 

1947 

15 

50.8 

762.0 

40,0 

1948 

16 

47.8 1 

764.8 

40.6 

1949 

17 

43,8 

744.6 6,014.8 

41.2 

1950 

18* 

45.8* ' 


41.8 

1951 

19* 

48.1* 


42.4 

1952 

20* 

48.3* 


43.0 

1953 

21* 

50.5* 


43.6 

Total 

0 

1,092.4 

2,094.1 



* Not used for computing trend. 

Data from various issues of the Survey of Current Buaines$, 

I 
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that variable. Now, we can choose any year we wish, and in Table 12.2 
it may be seen that the X origin was taken at 1932. By taking the origin 
at 1932, the middle year, we have caused the sum of the X values to equal 
zero, with the result that the two normal equations may now be written: 

I. SF = Na, 

11. SYF = h1,X\ 


Now, normal equation I gives the value of a and normal equation II 
yields the value of h. Table 12.2 shows the computation of 2F and of 
'SXF. N is obtained by counting the number of years or by subtracting 
the first year from the last and adding one. The value of could have 
been computed in Table 12.2. However, this is never necessary for a 
time series problem, since the sums of the squares of a series of natural 
numbers (1, 2, 3, . . .) may be read from Appendix B or computed by 
means of the formula given in that Appendix. The sum of the squares 
of the first 17 natural numbers is seen to be 1,785 in Appendix B, so, for 
the magazine advertising data, SX® = 2(1,785) = 3,570. We may now 
substitute in the two normal equations, obtaining 


I. o = 
II. b = 


N 

SXF 

SX» 


1,092.4 

35 


= 31.21 and 


2,094.1 

3,570 


0.5866. 


The trend equation is 

F„ = 31.2 + 0.59X. 

Origin, 1932. X units, 1 year. 


The trend values for each year are shown in the last column of Table 
12.2. An individual trend value is obtained by substituting the appro- 
priate X value (with sign) in the trend equation. When trend values for 
all of the years are wanted, they may be obtained most expeditiously 
by placing the a value of 31.21 million agate lines opposite 1932 and 
repeatedly adding the value of b for the years 1933-1953. For 1931 to 
1915, the value of b is repeatedly subtracted from the 1932 trend value.® 
The trend of the series is shown in Chart 12.3. Since two points deter- 
mine a straight line, it was drawn by plotting the trend values for 1915 


“ The repeated additions may be made on a calculating machine or, by adding and 
subtotaling each time, on an adding machine. The repeated subtractions may be 
done similarly. If an adding machine which has no subtraction key is to be used, it is 
beat to compute first the trend value for the first year and then obtain the others by 
repeated addition. 
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and for 1949 and connecting these points. Selecting the two points well 
toward the ends of the X series results in greater mechanical accuracy 
in drawing the trend line. The trend has been extended thtsngh 1953 
although the observed values for 1950-53 were not used to obtain the 
trend. This is a customary procedure, since it is not practical or desirable 
to recompute a new trend each year. Furthermore, it is not desirable to 
have too many high or low values at the ends of a series. At a later point 
in this chapter, it will be explained that, particularly for short series, a 
trend should be fitted to data which begin and end with approximately 
the same stage of a cycle. Since the trend for magazine advertising was 
fitted to a period of 35 years, this consideration is of minor importance. 
The effect of excluding some of the early years or of including the data for 
1950-1953 will be commented upon toward the end of this chapter. 

Chart 12.1 showed a straight-line trend fitted by inspection which was 
found to have the equation 


F. - 31.6 + 0.68Z, 

with origin at 1932 and X units 1 year. The least-squares trend eauation 
was 

Yc - 31.2 + 0.59X, 

with the same origin and X units. Note that the two equations differ 
very little in regard to their a values, but that b for the inspection trend 
is larger. It is not to be expected that the two should agree. It has 
already been noted that the criteria of fit for the two methods are different. 
Furthermore, the criterion of equal areas for the inspection fit is not 
applied mathematically, but visually, and is therefore subject to errors of 
judgment. 

Even number of years. It may have occurred to the reader that the 
time-saving device of taking the origin at the middle year might fail us 
when it becomes necessary to deal with an even number of years. As a 
matter of fact, we can continue to use the short forms of the normal equa- 
tions but we shall (1) take the origin between the two middle years and 
(2) state the X values in terms of half-years. This has been done in 
Table 12.3, in which the computations are performed for fitting a straight- 
line trend to the production of sweet potatoes in the United States for 
1931-1952. The data are shown graphically in Chart 12.5. 

In Table 12.3 the origin was taken between 1941 and 1942. From this 
origin it is one-half year (X — 1) to the middle of 1942 and one-half year 
(X ~ — 1) to the middle of 1941. There is, of course, an interval of two 
half-year periods between any two adjacent years; therefore, 1940 is 
shown as —3, 1943 as 3. and so on. As before, the value of SX^ need 
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not be obtained by squaring and summing the X values. The sum of 
the squares of a series of odd natural numbers ( 1 , 3, 5, . . .) may be read 
from Appendix C or computed by means of the formula given in that 
appendix. From Appendix C the sum of the squares of the first 11 odd 

iiJLLIONS 
or BUSHELS 



m\ *34 *37 *40 ’43 ’ 4 $ ’49 1952 


Chart 12.5. Production of Sweet Potatoes in the United States, 1931-1952, 
and Trend as Shown by a Straight Line Fitted hy the Method of Least Squares. 
Data of Table 12,3. 


natural numbers is seen to be 1,771, so SX^ = 2(1,771) = 3,542. We 
may now solve the two normal equations for a and h: 


I. a - 
II. 5 - 


ZF ^ 1,331.4 
X 22 ■“ 

SXF ^ -3,459.4 
SX^ 3,542 


And the trend equation is 

Yc - 60,5 - 0.98X. 

Origin, 1941-1942. X units, | year. 

This trend is shown on Chart 12.5 by a broken line. 

Note that the trend for sweet potato production has a downward slope. 
The sign of b in the trend equation is obtained as a result of the computa- 
tion of SXF, being negative when this sum is negative and positive when 
this sum is positive. 
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Before leaving this illustration, it may be in order to point out that if 
production data for sweet potatoes over a longer period, say, 1909-1952, 
were to be considered, the trend would not be a straight 'line. For those 
years, the trend would be slightly curved and concave downward. 

TABLE 12,3 

Computation of Values for Fit of Straight Line to Data of Fro- 
duction of Sweet Potatoes in the United States^ 1931—1952 


(Millions of bushels of approximately 55 pounds) 


Year 


Y 

XY 

Trend values 
Yc 

1931 

-21 

67.3 

-1,413.3 

81.1 

1932 

-19 

86.6 

-1,645.4 

79.1 

1933 

-17 

74.6 

-1,268.2 

77.2 

1934 

-15 

77.7 

-1,165.5 

75.2 

1935 

-13 

81.2 

-1,055.6 

73.2 

1936 

-11 

59.8 

- 657.8 

71.3 

1937 

- 9 

68.1 

- 612.9 

69.3 

1938 

- 7 

68.6 

- 480.2 

67.4 

1939 

- 6 

61.7 

- 308.5 

65.4 

1940 

- 3 

61.7 

- 155.1 

63.4 

1941 

- 1 

62.6 

- 62.5 -8,826.0 

61.5 

1942 

1 

65.5 

65.5 

59.5 

1943 

3 

71.1 

213.3 

57.6 

1944 

6 

68.3 

341.5 

55.6 

1946 

7 

61.3 

429.1 

53.6 

1946 

9 

60.8 

547.2 

51.7 

1947 

11 

49.6 

545.6 

49.7 

1948 

13 

43.1 

560.3 

47.8 

1949 

15 

45.0 

675.0 

45.8 

1950 

17 

49.8 

846 6 

43.8 

1951 

19 

28,8 

547.2 

41.9 

1952 

21 

28.3 

594.3 5,365.6 

39 9 

Total 

0 

1,331.4 

-3,459.4 



Data from U. 8, Department of Agriculture, Agricttltural Statistics, 195M, p. 308, and 
i9S$ Annual Summary: Acreage, Yield, and Production of Principal Crops, p. 37, 


ADAPTING EQUATIONS TO A MONTHLY BASIS 

In the preceding illustrations, trend lines were fitted to annual, rather 
than to monthly, data. The process of fitting a straight-line trend to 
monthly data is no different from that of fitting to annual data, but there 
are 12 times as many observed values to be considered and, because the 
X values become larger, the labor is multiplied by more than 12. It is 
therefore advisable to fit a straight-line trend to annual data and then to 
transform the trend to a monthly basis. The result is ordinarily the same 
as if the trend had been fitted to the monthly data. In some cases, it is 
preferable to obtmn the trend from annual data, since the presence of a 
very violent seasonal njovement may distortatrendfittedtomonthly data. 
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The trend for tne annual data 
Annual totals— X . 5 ;^^a,s found to be = 31.2 + 0.59X 

of magazine advertising for 1915 

with origin at 1932 an ^ V £ advertising per year; each figure, 
in terms of millions o agate 

therefore, was a total for tl y 21 millions of agate 

The value obtained for a {to lour & / „ , 

. _ ^ = 31.21 was the arithmetic mean of the 35 figures for 

hues, and a 31 21 was the a value for annual 

.. 2.»os 

Now this is the annual increase in t increment 

an entire year. If we divide by 12 , ^^n 

in the yearly totals. ^ millions of agate lines per month. 

.gain by 12 ‘0^““ *'opSons .t once by dividing by 144, p^g 
rXtw 6 ™lu1 ot 0.5866 * 144 - 0.004074 of e«ate lm«. The 
equation in monthly terms is 

Y = 2.6008 + 0.004074X. 

Origin, June-July 1932. X units, 1 month. 

our adjustment is not 

are an even number of ^ n^onths and is therefore out 

an origin which falls between the one-half month.’ Conse- 

of step with the point between two months to an^ 

Ut na abift 

“S) — 7 ! of 5 cena^n. unchanged. The new 

equation, then, is 

Y„ = 2.6028 + 0.004074X. 

Origin, July 1932, X units, 1 month. 

We inUl -cord only four 
monthly trend values in Table 16.3. 
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Aimual totals — X iinits^ one-half year. When a straight-line 
trend was fitted to the production of sweet potatoes for 1931-1962^ the 
resulting equation had X units of ^ year because the data covered an even 
number of years.® It 'would not be particularly meaningful to r^ace the 
annual trend equation for sweet potato production to a monthly basis, 
because sweet potato production does not take place every month in the 
year. Neither is an illustration necessary here, since the procedure is 
exactly the same as that just described except for the fact that b is divided 
by 6 X 12 == 72 instead of by 144. This is so because the b value in the 
annual trend equation refers to the increase taking place in the trend 
during each six-month period. 

Monthly averages — X units, one year. If a straight-line trend has 
been fitted to annual data which are monthly averages for each of an odd 
number of years, it is merely necessary to divide the annual b by 12 and 
shift the origin so that it will be compatible with monthly data. Suppose 
that a trend for the years 1928-1952 has been obtained for the production 
of a manufactured commodity, the annual trend equation being 

7, = 2,430 + 24.0X. 

Origin, 1940. X units, 1 year. 

Since the original data were monthly averages for each year, the value 
of a does not need to be adjusted. The value of b represents the annual 
increase and must be divided by 12 to obtain the monthly trend lucre 
ment. The monthly trend equation then is 

Yc = 2,430 + 2.0X 

Origin, June-July 1940. X units, 1 month. 

To complete the adjustment, we must shift the origin of the equation 
so that it will coincide with a month instead of falling between two 
months. If the origin is shifted to June 1940, it is merely necessary to 
decrease the value of a by one-half of the value of the monthly 6, giviug 

F. « 2,429 + 2.0X, 

Origin, June 1940. X units, 1 month. 

Montlily averages — X units, oue-lialf year. The procedure is the 
same as that just described except that the semiannual b is divided by 6. 

® An annuai trend equation, such as that for the production of sweet potatoes, 
could be shifted so that the X units wouid be 1 year instead of one-half year. This 
merely requires doubling the value of h However^ it would also be necessary to 
shift the origin so that it would fail on a year instead of l:j;ptween two years. 
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The foregoing discussion of the procedure for shifting annual straight- 
line trend equations to a monthly basis may be summarized for purposes 
of reference as follows: 


X unit in 

1 Type of data 

annual 

1 Monthly averages 

1 Annual totals 

equation 

a 

b 

a 1 

b 

One year 

No 

change 

Divide 
by 12 

Divide 
by 12 

Divide 
by 144 

One-half 

year 

No 

change 

Divide 
by 6 

Divide 
by 12 

Divide 
by 72 


Under all circumstances, the origin must be shifted so that it falls on a 
month instead of between two months. 

SELECTING THE PERIOD FOR TREND ANALYSIS 

In general, it is desirable to use as long a period as possible when a trend 
is to be determined. This practice results in a more reliable statement 
of the trend and one which is less affected by one or two large cyclical 
movements. 

If the nature of the trend of a series has changed, it may be necessary 
to use two trends. It may or may not be possible to splice the two 
trends together. The depression of the 1930’s was so severe that, for 
some series, it now-seems to have been more in the nature of a readjust- 
ment. Consequently, one may occasionally use one trend for the years 
before the readjustment but a different one for the years following the 
readjustment. It would have been possible to fit two trends to the data 
of magazine advertising, shown in Chart 12.3, but we chose to show those 
data in terms of a single trend covering a longer period of time. 

It is important that the first few and the last few years of a series be 
given special consideration before a decision is made concerning the 
period to be used. If the data cover only ten or fifteen years, this is of 
particular importance; for longer periods, it is less important. The first 
year should not be one of depression and the last year one of prosperity, 
since that will cause an upward trend to be too steep; & will be too large. 
Conversely, if the first year was one of prosperity while the last year was 
one of depression, the slope, if upward, would not be steep enough; b 
would be too small. To avoid the introduction of such extraneous factors 
in the slope, the first and last years should be on opposite sides of the cycle 
(not on opposite sides of the trend) and about the same distance above, 
or below, the trend. Thus, in Chart 12.6 CD = CD' and a trend fitted 
to data extending from D to D' will have the correct slope. 

Not only should the slopp be correct, but the l&iel of a trend should also 
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be statable. If a trend were fitted to the data of Chart 12.6 running from 
D to D', the level of the trend would be too high. The trend should be 
fitted to a period running from B to B'. This would result in a proper 
level for the trend, since the areas ABE and A’B'E' are each ^e-fourth 
of a cycle. The first and last years should not both be low points of 
particularly deep depressions, since they would then lower the level of the 
trend; a would be too small. Conversely, the end years should not both 



Chart 12.6. Cycles and Appropriate Trend. 


be high points of marked prosperity, since they would then raise the level 
of the trend unduly. 

The trend for magazine advertising was fitted to the years 1915-1949. 
Although, as may be seen in Chart 12.3, the series does not begin and end 
with the same phase of a cycle, the trend is satisfactory because the period 
covered is relatively long. What changes would occur in the trend 
equation if some of the early years had been omitted or some of the later 
years included? The equation obtained earlier for the period 1915-1949 
was 

Y, = 31.2 + 0.59X, 

with origin at 1932 and X units 1 year. Continuing to use the same 
origin and X units, the reader can verify, by computations based upon 
Table 12.2, that if the first four years were omitted, the trend equation 
for 1919-1949 would be 

Yn = 31.8 + 0.60X. 

In view of the rules laid down in the preceding paragraphs, 1919-1949 
may be more appropriate than 1915-1949 as the period for which a trend 
should be determined. However, owing to the length of the series, the 
results differ little; the 1919-1949 equation, if drawn on Chart 12.3, could 
be distinguished from the 1915-1949 trend only toward the ends. 
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If the last four years were to be added, the trend equation for 1915-1963 
would be 

Yc = 31.6 + 0.67X. 

This equation, too, if drawn on Chart 12.3, could be distinguished from 
the 1915-1949 trend only toward the ends. 

SELECTING THE TYPE OF TREND 

Since the discussion, so far, has been limited to trends fitted by inspec- 
tion and to straight lines fitted by the method of least squares, there is 
not much that can be said at this point concerning the selection of the 
type of trend. We shall be in a better position to consider which one of a 
number of possible trend types is most appropriate after some additional 
types have been described in the following chapter. 

As a first step, the original data should always be plotted and examined. 
It may even be worth while to sketch in a tentative trend by inspection. 
In some instances a trend fitted by inspection may suffice; but when the 
trend itself is to be studied, or extended, a mathematical equation should 
be used. If examination of the charted data indicates that the trend is 
not linear, one of the trend types described in Chapter 13 may be appro- 
priate. The trend type chosen should be one which is logical in relation 
to the series which it undertakes to describe and in relation to the forces 
affecting that series. It is for this reason that a straight line, which indi- 
cates a constant amount of increase or decrease, cannot be expected to 
constitute an appropriate trend of a series over an extended period of 
time. 



Symbols Used in Chapter 13 


a: a constant in various trend equations. 

A : a constant in an orthogonal polynomial of the first, or higher, degree. 
b: a constant in various trend equations. 

B: a constant, associated with Xi, in an orthogonal polynomial of the 
first, or higher, degree. 

c: a constant in a polynomial of the second, or higher, degree. As a sub- 
script, c distinguishes a computed value from an observed value; 
see Yc. 

C: a constant, associated with X 2 , in an orthogonal polynomial of the 
second, or higher, degree. 

d: a constant in a polynomial of the third, or higher, degree. 

D: a constant, associated with Xz, in an orthogonal polynomial of the 
third, or higher, degree, 

e: a constant in a polynomial of the fourth, or higher, degree. 

/: a constant in a polynomial of the fifth, or higher, degree. 
k: the asymptote of an asymptotic growth curve. 
kzf fci, kzi When one logistic curve is built upon part of another, fco is the 
upper asymptote of the first logistic curve and ki and are, respec- 
tively, the lower and upper as 3 nDaptotes of the second logistic curve, 
u: lower-case Greek mu, used to assist in determining the trend values 
for a logistic curve, jtt = 

n: for a modified exponential or a Gompertz curve, the number of years 
in each third of the series; for a logistic curve, the number of time units 
between Xo and or between Xi and xz* 

N : the number of items in a series. 

r: a subscript of X in an orthogonal polynomial; it may have a value of 
1, 2, 3, etc. 

S: lower-case Greek sigma, meaning ^Hake the sum of.^^ 

Si, S 2 , S 3 : respectively, the sums of values for the first, second, and third 
equal parts of a series. 

when fitting a logistic curve, the years associated with |/i, 
and 

X : a value of the X series. 

Xx, X 25 Xzf etc.: variables in orthogonal polynomials. 

Vh yt* three selected F values used for fitting a logistic curve, 

Y : an observed value of the Y series. 

Ye,: a computed value of the F series. 

!: factorial, 51 - 1 X 2 X 3 X 4 X 6 . 
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CHAPTER 13 


Analysis of Time Series; 

SECULAR TREND II— NON-LINEAR TRENDS 


Chapter 12 discussed only the simplest type of trend equation, the 
straight line. It was noted that, for short periods of time, a straight line 
may provide a reasonably good description of the trend of a series, but 
that for longer periods a curved line of some sort may be called for. This 
chapter will describe the properties of several non-linear equation types, 
will explain how to fit them, and will give some indication of how to 
proceed in choosing among the various trend types. 

SIMPLE POLYNOMIALS 

This family of curves has as its most elementary representative the 
straight line, which, it will be remembered, has two constants. The 
straight line and four other polynomials are shown below: 

First-degree (straight line) . . Fc = a ■+• bX. 


Second-degree F^ == o hX -f cX‘. 

Third-degree = a + bX + cX‘ + dX\ 

Fourth-degree F. = a -h -f -1- dZ» -f eZ*. 

Fifth-degree Y, ^ a + bX + cX^ + dX^ + eX* -f-/Zs. 


When a third constant is added to the equation for the straight line, 
the second-degree curve, which has one bend, is obtained. Because of the 
bend in the second-degree curve, the slope of the curve is continually 
changing. If a sufficient number of Z values are included, the second- 
degree curve will have a positive slope in one portion and a negative slope 
in another. This may be observed in Chart 13.1, which shows eight 
second-degree curves. 

Each constant added to the second-degree equation may introduce an 
additional bend in the curve. Thus, a third-degree curve may have two 
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-10 -5.00 
-7 -2.45 
-4 - .80 
-2 - .20 
0 0 
2 - .20 
4 - .80 
7 -2.45 
lO -5.00 
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bends, as shown in Chart 13.2. The lower of the two in Chart 

13.2 shows clearly the fact that the /'lope of a third-degree curve may 
change twice from positive to negative or from negative to positive. 
Since such a change in the direction of slope may occur three times in a 
fourth-degree curve and four times in a fifth curve, it follows that 
fourth- and lifth-degree curves hardly coincide with the concept of secular 
trend which is of interest to us. Consequently, we shall give no further 
attention to fourth- and fifth-degree curves, but shall describe the process 
of fitting the second-degree curve in some detail and briefly consider the 
third-degree curve. 

Second-degree curve. The second-degree curve is only a little m’ore 
complicated than a straight line, since it involves merely the addition of 
to the equation for a straight line, giving 

F. - a -b H- cZK 

The eight second-degree equations, which have been plotted in Chart 
13.1, give some idea of the flexibility of this equation type. Portions 
of such a curve fitted to a time series may slope upward or downward 
(or upward in one portion and downward in another) and may be concave 
upward or concave downward. While a straight-line indicates a con- 
stant amount of increase or decrease, a second-degree curve involves 
increasing or decreasing amounts of increase or decrease. More specifi- 
cally: the second differences of the values obtained from the expression 
Ftf a + 5J5r + cX^ are constant.^ 

Fitting the second-degree curve. Since there are three constants or 
unknowns in the second-degree curve, the following three normal equa- 
tions are required: 

I. 2F - + &2X + 

II. SXF = aSX + 62X2 ^ ^SX® 

III. 2X2F - a2X2 + 62X3 4 . ^ 2 X 1 

However, we are dealing with a time series, and the origin may be taken 
at the middle year (or other time unit), or between the two middle years, 


may be seen by considering the Yc values for section 2 of Chart 13.1, for 
which the equation is Fc « — l -j- 2X — 0.3X®: 


X 

Yc 

First 

difference 

Second 

difference 

X 

Fc 

First 

difference 

Second 

difference 

-3 

-2 

-9.7 

-6.2 

-3.5 


2 

3 

1.8 

2,3 

-1.1 
-0.5 ^ 

-o;6 

-0.6 

-1 

-3.3 

-2.9 

-0.6 

4 

2,2 

0.1 

-0.6 

0 

-1.0 

-2.3 

-0.6 

5 

1.5 

0.7 

-0.6 

1 

0.7 

-1.7 

-0.6 

6 

0,2 

1.3 

-0.6 
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as before, with the result "that the summations of all odd powers of X 
are zero. Therefore, the three normal equations become 

I. 27 = Aa + cZXK 
II. SX7 = 

III. 2X^7 = o2X2 + c2X^ 

Notice that, instead of having to solve three equations simultaneously, 
the value of h is obtained from Equation II, while the values of a and c 



1935 1939 1942 1945 1948 1952 

Chart 13.3. United States Production of Crude Gypsum, 1935-1952, and 
Trend as Shown hy a Second-Degree Curve. Data of Table 13.1. 

are gotten by solving Equations I and III simultaneously. The use of the 
middle year as the origin has enabled us to save much labor. 

Table 13.1 and Chart 13.3 show the production of crude gypsum in the 
United States for the years 1935 to 1952 inclusive. The upward trend 
of the series is not linear, and these data will form the basis of our illustra- 
tion of a fit of a second-degree curve. The three normal equations call 
for the numerical values of N, 27, 2X7, and 2X“7, which may be 
obtained from Table 13.1, and the values of 2X* and 2X^ (for the first 
nine odd natural numbers), which may be read from Appendix C. Sub- 
stituting in the three normal equations gives ‘ 

1. 88,159 = 18a + 1,938c. 

II. 358,287 = 1,9386. 

III. 10,080,943 = 1,938a -1- 374,034c. 
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TABLE 13.1 

Computation of Values for Fit of Second^Degree Curve to Production of 
Crude Gypsum in the United States, 1935-1952 
(Thotisanda of short tons) 


Computation of trend values 


Year 


Produc- 

tion 

F 

XF 

X^F 

X2 

a +bX 


Trend 

value 

Yc 

1935 

-17 

1,881 

-31,977 


289 

1,371.3 



1936 

-16 

2,676 



225 

1,741.0 


2,543 

1937 

-13 

3,014 

-39,182 


169 

2,110.8 

602.1 

2,713 

1938 

-11 

2,671 


323,191 

121 


431.1 


1939 

« 9 

3,196 

-28,764 

258,876 

81 

2,850.3 

288.6 

3,139 

1940 

7 

8,664 

-25,648 

179,536 

49 


174.6 

3,395 

1941 

- 5 

4,706 

-23,530 

117,650 

25 

3,589.8 

89.1 

3,679 

1942 

- 3 

4,634 



9 

3,959.6 

32.1 


1943 

- 1 

3,919 

- 3,919 

3,919 

1 


3.6 

4,333 

1944 

1 

3,754 

3,754 

3,754 

1 

4,699.0 

3.6 


1946 

3 


11,406 

34,218 

9 


32.1 

5,101 

1946 

5 

6,615 

28,075 

140,376 

25 

5,438.5 

89.1 


1947 

7 

6,198 


303,702 

49 

5,808.3 

174.6 

6,983 

1948 

9 

7,044 

63,396 


81 


288.6 

6,467 

1949 

11 

6,491 

71,401 

785,411 

121 

6,547.8 

431.1 


1950 

13 

8,119 


1,372,111 

169 

6,917.5 

602.1 

mgmm 

1951 

16 

8,705 


1,958,625 

226 


801.6 

8,089 

1952 

17 



2,332,230 

289 

7,667.0 

1,029.6 

8,687 

Total 

0 

88,169 

358,287 

10,080,943 



. . . 



Data, from U. S. Department of Commerce, Office of Business Economics, Busineas Stathties, 1963 
Biennia! Edition, p. 185. 


The value of & is given by the second normal equation : 

1,9386 = 358,287; 
b - 184.875. 

Next, the values of a and c are obtained by solving normal equations I 
and III simultaneously. The steps are: 

1. Multiply normal equation I by 193 and subtract normal equation III 
from this new form of normal equation I, thus obtaining^ the value of a, 

(I X 193). 17,014,687 - 3,474a + 374,034c. 

II. 10,080,943 == 1, 938a + 374,034c. 

6,933,744 « 1,536a. 

a = 4,514.15625. 

* The multiplying factor 193 was obtained by dividing the coefficient of c in nor- 
mal equation III by the coefficient of c in normal equation I. That k, SIX'* -s- SX^ 
« 374,034 4 - 1,938 «» 193. When solving two equations simultaneously, ^either 
unknown may be eliminated by multiplying one of the equations by the quotient of 
the coefficients of the unknown which is to be elimmated and subtracting one equation 
from the other. 
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2. Substitute the value of a in normal equation I to obtain the value of c. 

I. 88,159 = 18(4,514.15625) + 1,938c. 

1,938c = 6,904.1875. 
c - 3.56253225. 

3. Substitute, in normal equation III, the values obtained for a and c 
This serves as a check of the computations in steps 1 and 2. 

III. 10,080,943 = 1,938(4,514.15625) + 374,034(3.56253225), 

= 10,080,943. 

The second-degree trend equation may now be written: 

Y, = 4,514.16 -f 184.875X -|- 3.5625X^ 

Origin, 1943-1944. X units, year. 

The computation of the trend values is shown in the last four columns 
of Table 13.1. The trend, shown in Chart 13.3, is the result of plotting 
these trend values. Note that the production of crude gypsum seems to 
show four cycles during the years 1935-1952. 

THIRD-DEGREE CURVE 

By adding one more constant to the equation for a second-degree curve, 
we are enabled to put one more bend into the curve. While a straight line 
has only one slope, a second-degree curve (Chart 13.1) slopes in a positive 
direction at one stage and in a negative direction at another, and a third- 
degree curve (Chart 13.2) may include three directions of slope. 

Four normal equations are required for a third-degree curve: 

I. 'SY = Na + 6SX + cSX* -f- dSX». 

II. SXF = aSX + 6SX2 + cSX® -h d2X*. 

III. SX^F = oSX^ + 6SX* + cSX^ + dSX®. 

IV. SX*F = aSX* -f 62!X^ + cSX' -f d2X«. 

Again, if the X origin is taken at the middle of the period, the odd powers 
of X will total zero, leaving these equations: 

I. SF = iVffl -f- c2X^ 

11. SXF « 62X* ri- dSX^. 

III. SX*F = aSX* -f cSXl 

IV. SX»F = bSX^ -1- dSX®. 
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With the equations in this form, we do not have to solve four simultaneous 
equations, although that would have been necessary if the origin had been 
taken anywhere other than at the middle of the period. T|ie values of 
a and c are obtained by solving Equations I and III simultaneously; 
simultaneous solution of Equations II and IV gives the values of b and d. 
Only one column of figures, in addition to those shown in Table 13.1, must 
be computed; it is a column headed the total of which gives 
Note that Equations I and III are exactly the same as for the second- 
degree curve. Consequently, for a given set of data, the values of a and c 
will be the same for a second-degree curve and for a third-degree curve^. 

Orthogonai polynomials. A minor disadvantage of polynomial 
equations of the type described is that each additional constant added 
to an equation requires that some of the constants previously obtained 
be abandoned and new constants computed to take their place. Thus, 
a second-degree curve uses the same value for 5 as a straight line, but 
requires a different value for a; a third-degree curve uses the same values 
for a and c as a second-degree curve, but requires a new value for b; 
a fourth-degree curve uses the same values for b and d as a third-degree 
curve, but new values must be calculated for a and c; and so on. Orthog-- 
onal polynomial equations involve a transformation of such a nature 
that, as new constants are added, the old constants remain the same. 

Such equations are very convenient to use, since we merely build up our 
equation by adding new constants until a satisfactory fit is obtained and 
simultaneous solution of equations is avoided. There is thus no lost 
motion, and the labor involved becomes progressively less than that 
required to fit a curve by the ordinary method for equations of third 
degree and higher. The trend %’’alues obtained by the two methods 
are exactly the same. 

Although the labor required for fitting is modest, the theory of orthog- 
onal polynomials^ is beyond the scope of this text, and will not be ex- 
plained here. Whereas the ordinary third-degree polynomial is of the 
type 

Fc = o + hX + cX^ + dX\ 

the orthogonal polynomial is 

Fc « 1 + jBXi + ax, + DXs. 

In working nith orthogonal polynomials, the X origin is conveniently 
taken at the middle, so that SZ = 0. If N is odd, the X values are 
taken as * * * -3, -2, -1, 0, +1, +2, +3 * • * in the usual fashion; 
if iV is even, they are taken as * ■ * —2.5, —1.5, —.5, +.5, +1.5, +2.5 


® See R. A, Fisher, SiaiisHcal Methods for Research Workers, Oliver and Boyd, 
Edinburgh, 1936 (sixth edition), pp. 149-150, and Hafner Publishing Co., New York, 
1960 (eleventh edition), pp, 147-153. See also E. A. Fisher and F. Yates, Statutical 
Tables for Biological, Agricultural and Medical Research, Hafner Publishing Co., New 
York, 1949 (third edition), pp. 23-25 and 70-80. 
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• • • . The variables Xi, X 2 , Xj • • • are derived from the moments 
01 the X series. In form easy to use, these are: 


Xi = X. 

X. = XI 


X, = Xf 


“iF~* 

3iV* - 7 


20 


Xi. 


Xr = XlXfr_l) 


(r - mN^ ~ jr- I)^] 
4I4(r — 1)® — 1] 


X<._a. 


N is, as usual, the number of items in the series— the number of years or 
months— and r is the subscript of the X under consideration. Each of 
these equations is worked out, and in the computation table there will 
be column headings for Xi, Xa, and Xj. The constants A, B, C, and 
D will be obtained as follows: 


A = 


SF 

N' 


B = 
C = 
D = 


Coefficient of Xr = 


12 


- 1 ) 
180 


SXaF. 


SXaF. 


N{,m - l)(As - 4) 

2800 

N{N‘ - 1)(X2 - 4)(iV“ - 9) 

(2r)!(2r+ 1)! 

{r\)*N{m - l){m - 4) - r^) 


SXrF. 


In obtaining the trend values, the constants are multiplied by Xi, Xa, 
and Xa instead of X, X“, and X®. 


USE OF LOGARITHMS 

Straight line fitted to logarithms. A glance at Chart 13.4 makes it 
quite apparent that a curve of the type Fc = a + bX would not be a satis- 
factory description of the trend of the production of asphalt for the period 
shown. A second-degree curve might be used, but a more logical trend 
equation is available, A second-degree curve fitted to this series would 
behave in such a fashion that the amount of increase each year would be 
increasing by a constant amount; this is the same thing as saying that the 
second difference of the trend values is a constant, but with the additional 
provisos (1) that the trend is upward and (2) that the second differences 
are positive. Now, a curve of the type Fc = ab^ indicates a constant 
ratio of change, and, if such a curve were to be fitted to the data of Chart 
13.4, it is clear that the ratio would be greater than 1,0 rather than less 
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than 1.0. That is to say, the series is increasing. The data of asphalt 
production have been plotted on semi-logarithmic paper in Chart 13.5, 
and it appears that the trend, which was not linear in Chart 13.4, is now 
linear. This indicates the suitability of the equation type Fc = the 
exponential curve. 

Syi!LLSOPIS OF 



Chart 13.4. United States Production of Asphalt from Petroleum, 1941- 
1952, and Trend as Shown by a Straight Line Fitted to the Logarithms of the 
Data. Note that this chart has an arithmetic vertical scale aad that the trend line is 
slightly curved. Data of Table 13.2. 

It is not possible to fit the exponential curve directly to the Y values 
by least squares; we can, however, make a least-squares fit to the loga- 
rithms of the original data/ and this results in minimizing the squared 
deviations of the logarithms of the observed values from the logarithmic* 
trend values. Putting the exponential equation in logarithmic form gives 

log Fc = log a + X log 6, 

which is a straight line in terms of X and log F. The normal equations 


•^This equation may be fitted to the F values by a method described in James W. 
Glover, Tables of Applied Mathematics in Fimnce, Insurance^ Btatistm, George 
Wahr, Ann Arbor, Mich., 192% pp. 463-481. Glover's^method results in a and h values 
such that SFc = 2F and 2XF« = 2JCF, with the origin taken at the first year. It is 
not a least-squares fit* 
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1941 '42 *45 '44 ’45 ’46 ’47 ’48 '49 ’50 ’61 1962 

Chart 13.3. United States Production of Asphalt from Petroleum, 1941- 
1932, and Trend as Shown hy a Straight Line Fitted to the Logarithms of the 
Data, Note that this chart has a logarithmic vertical scale and that the trend is 
linear. Data of Table 13.2. 

are 

I. S log F = A' log a + log 6SX. 

II. S Xlog F = log aXX + log 

Since the X origin may be taken at the middle of the period, SX = 0; so 
these equations may be written 

I. S log F = N' log a. 

U. S Xlog F - log 

Using the summations shown in Table 13.2 and getting from 
Appendix C, we have 

I. 47.145300 = 12 log a, 

log a = 3.928775. 

II. 8.025212 = 572 log b, 

log b = 0.0140301, 
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The trend equation in logarithmic form is 

log Y, = 3.928775 -f 0.0140301Z. 

Origin, 1946-1947; X units, | year. 

To obtain a and b, we look up the anti-logarithms of log a and log b and 
we can then write the trend equation in natural form : 

F. = (8,487.4) (1.0328)^. 

Origin, 1946-1947; X units, -$• year. 

The log Yc values and the Fc values for each year are shown in the last- 
two columns of Table 13.2. The Yc trend values are shown on both 

T.4BLE 13.2 


Computation of Values for Fit of Straight Line to Logarithms of United 
States Production of Asphalt from Petroleum^ 1941-19 S2 
(Thousands of short tons) 


Year 

X 

Produc- 

Log F 

X log Y 

Trend values 

tion 

r 

Log Fe 

Yc 

1941 

-11 

6,558 

3.81677J 

-41 .984481 

3.774444 

5,949 

1942 

9 

6,296 

3.799065 

-34.191585 

3.802504 

6,346 

1943 

- 7 

6,757 

3.829754 

-26.808278 

3.830564 

6,770 

1944 

- 5 

6,996 

3.844850 

-19.224250 

3.858624 

7,221 

1945 

- 3 

7,127 

3 852907 

-11.558721 

3.886685 

7,703 

1946 

' - 1 

8,166 

3.912009 

- 3.912009 

3.914745 

8,218 

1947 

1 

8,962 

3.952405 

3.952405 

3.942805 

8,766 

1948 

3 

9,440 

1 3 974972 

11,924916 

3.970865 

9,351 

1949 

5 

8,910 

! 3.949878 

19.749390 

3.998926 

9,975 

1950 j 

7 

10,589 

4.024855 

28.173985 

4.026986 

10,641 

1951 1 

9 1 

12,055 

4.081167 

36.730503 

4.055046 

11,351 

1952 j 

11 ! 

12,784 

4.106667 

45.173337 

4.083106 

12,109 

Total i 

0 1 


47 145300 

8.025212 




Data from U. S» Department of Commerce, Office of Business Economics, Eminem StaiiMies, 
Biennial Edition, p. 174. 

Charts 13.4 and 13.5. To draw the trend on Chart 13.5, it was nierel.y 
necessary to obtain the Yc values for 1941 and for 1952, to plot these two 
values, and to connect them with a straight line. Drawing the trend on 
Chart 13.4 requires plotting all, or nearly all, of the trend values. 

The trend equation, w'ritten in the form 

Yc = (8,487.4) (1.0328)^, 

tells us that 8,487.4 short tons was the trend value for a point midway 
between 1946 and 1947, and that, during the period under consideration, 
the production of asphalt had an annual growth of 3.28 per cent. Inci- 
dentally, 8,487.4 short tdns is the geometric -mean of the F values. Since 
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the geometric mean is always a little smaller than the arithmetic meaHj 
and since the sum of the squares of the deviations of the logarithms 
(rather than of the original data) is at a minimum for this treiidj it follows 
that the sum of the deviations above the trend line of Chart 13.4 is 
slightly larger than tlie sum of those below it. This constitutes a minor 
shortcoming of this type of trend. However, the measured deviations on 
either side of the trend line in Chart 13.5 do cancel In addition, there 

MILLIONS 



!920 *24 *28 *32 ‘36 *40 *44 *48 iS52 

Chart 13.6. Domestic Consumption of Rayon Filament Yarn, 1920-1952, 
and Trend as Shown by a Second-Degree Curve Fitted to the Logarithms of 
the Data. Dafca of Table 13,3. 

is some merit in the fact that the use of logarithms equalizes the impor- 
tance of fluctuations in regard to their relative, rather than in regard to 
their abmhde, deviations from the trend. This is particularly pertinent 
when there are small cyclical variations about the lower portion of the 
trend and larger (that is, larger absolutely) cyclical variations about the 
upper part of the trend. In such a situation, the trend line is more likely 
to pass through all of the cycles rather than through only the larger ones. 
TMs point may ‘more than offset the technical disadvantage of fitting 
to the logarithms. 
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Second-degree curve fitted to logarithms. Sometimes data are 
encountered which, when plotted on semi-logarithmic paper, continue to 
show curvature, being concave either upward or downward. Chart 13.6 
and Table 13.3 show such a series, the domestic consumption of rayon 
filament yarn for 1920-1952, which is concave downward, indicating that 
the ratio of increase has been decreasing. We may fit a second-degree 
curve to the logarithms of the values, using 

log 7c = log a X log 6 -f X^ log c. 

Taking the A' origin at the middle cf the' period, the three normal equa' 
tions are 

I. S log 7 = iV log o -t- log c 2X“. 

II. S Xlog 7 = log 6 SX=*. 

III. 2 X^log 7 = log a 2X- -f- log c 2Xh 

From Appendix B we ascertain that 2X® = 2(1,496) = 2,992 and 
2X^ = 2(234,848) = 469,696. All of the other values may be had from 
Table 13.3, and we solve the normal equations as follows; 

II. 2 Xlog 7 = log h 2X2. 

160.140215 = 2,992 log b. 
log b = 0.0535228. 

I. 2 log Y = N log o -|- log c 2X2. 

III. 2 X=log 7 = log a 2X2 log c 

I. 76.269235 = 33 log a + 2,992 log c. 

III. 6,582.178801 = 2,992 log a -f 469,696 log c. 

(I X 90.666667). 6,915.077332 = 2,992 log a -f- 271,274.67 log c. 

III. 6,582.178801 = 2,992 log a + 469,696 log e. 

332.898531 = - 198,421.33 log c. 

log c = -0.00167774. 

I. 76.269235 = 33 log a ■+ (2,992)(~0.00167774;. 

33 log a = 81.289033. 
log a = 2.463304. 

Check, using III. 6,582.178801 = (2,992) (2.463304) 

+ (469,696)(-0.00167774), 
= 6,582.178801. 

Trend equation: Log 7c = 2.463304 + 0.0535228X - 0.00167774X® 
Origin,*1936. X units, 1 year. 
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I I M I I I I M I I I I I I 


«sir5ri<e«3N— iocsooi>®»n'!)<ei5CJi-<Oi-ie<»c<3'a<ifltowooo50<-ie<iM'^u5co 

ji— 4— ^rnt—fT— trH'r*! 4 t— ti— 4f— ( 

I M ! M M i ! i i I M I I I 

C5wor^coNC’OcoO"^«0’<^C5S'«44(aoc3>oco^cs-^ir5eooococ:5toocooio— 

-H:CC500rHCSSt>C>cOCq’-4C<S'^<MCOOCOtwOcO»-^C<SOOOOOOOOCOCS5CC^iO 

wOcDcOCOCOOrHO^CSiOt'.OOCnOOCOcOcyiOcOiOCftaiCOOOOOCai-^wOCSlCSIOO 

OiOCN’*HtOTt<CSIOOOO^cO»-^iOc35C^CCOt-cD01ii:>0'X!r-40iCOCS!lh.COOl^CO 

!:OG»a5^CM«JOOOO^^a500<MOOOr'-CS|cOiOOOiiOb.C5CCt>(McOCSiOlQOCO<M 

OC<*COsOCOb*t**OOrHO^»-<COCS|T4<Tt4*^-rj4»jC!ii:i«DCOCPl^i'^OOOOOsQOC»05C3a 

Q^^^rt»-4r-4ddd<Niddddc^’ddcsidc<»ddcsiojcsidcsidc<ies*«sid 


I *3 b*OOI>w>C<IC<««y»OtHK:>CS)COOOOOOh.«Oi-«r-400l>’^OOC^^'«5t»iOCOt^l>»A'^0 

J q^>m c«d'^ddqoddd1?H^i^iicsi1^^*cs^^^^^r^<cf>o6dd’^dcs^ddddddd 

d r-KWCOrHMr5<POOcO♦-H^JO>01-^05»0050^-lOOOiOWC»cOOCOCS|’^COiOcO’^ 

g r-j tH t-4 1-4 r-4 ca T-^ <N CS| CO 00 '*«H tH ^ IQ CO CD 1> 00 05 00 OO 


U O-H0isc0’tjHi040W 00 05O»HesSc0’rj4i0«0l^00CftOi-4Ci*!?0’tt4i.0<0t^0005Or«t<M 
C^C^«Cs^e^lC<l«C^O<^C<lC^^COCOCOCOCOC<!COCOCOCO"^••4^'^'1t>T^^xJ1^'4^^^«^1L0^^5^0 
JW l35050lCD05C35050>CfcC|505COC&05Cft05C!&0505C6fitoCOC&0>0505CbCft05C005CfeC5 , 

4y»4T>H«— 4f— <T<*'4rHy— 4i— 4fHf«4F^ w | 


btail ... 1 76.269235 | 0 I 160.140215 | 2,992 | 6,582.178801 | 

Data from Textile Beonomics Bureau, Ine,. Textile Organant Vol, XXIV, No. 2, February 195S, p. 20. 
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The procedure for computing the trend values is indicated in Table 
13.3. The trend is shown graphically in Chart 13.6, Two comments 
are in order concerning this trend: first, it is not a particularly good 
description of the series; second, the trend, if extended, would begin to 
drop off in 1953! Two trends, one fitted to data for 1920-1937 and the 
other to 1938-1952 data might be a better description. A Gompertz 
curve (see Charts 13.10 and 13.11) is a much better description and does 
not turn down. 


ASYMPTOTIC GROWTH CURVES 

The straight line Fc = a + 5Y, which was discussed in the preceding 
chapter, describes a constant amount of increase or decrease. The 
exponential curve, Yc = ab^j involves a constant ratio of change and, 
therefore, a constant ratio of change in the amount of change. If 5 is a 
positive number greater than one, the trend is upward and the amount of 
change is undergoing a constant percentage of increase; if & is a positive 
number smaller than one, the trend is downward and the amount of 
change shows a constant percentage of decrease. 

Over long periods of time, chronological series are not likely to show 
either a constant amount of change or a constant ratio of change. It is 
much more likely that an increasing^ series will show an increasing amount 
of change but a decreasing ratio of change. This is true of the data of 
Charts 13.10 and 13.11, w^hich show domestic consumption of rayon 
filament yarn. 

It is also possible that an increasing series may show a decline in the 
amount of increase. Decreasing absolute growth is not often encoun- 
tered, but we shall discuss one curve of this type, the modified exponen- 
tial, since it serves as an excellent introduction to the more important 
Gompertz and logistic curves. Before beginning a consideration of the 
modified exponential curve, passing mention may be made of three other 
curve types which may describe a decreasing amount of growth. These 
are: 

(1) Modified polynomials, such as Fc — a + Fc = a 4* hX^ + cX, 
and others. When three or more constants are present, one (or more) 
constants may be negative, in ivhich case the curve may ultimately turn 
down. 

(2) Straight line to log X. The expression is Fc = a + 6 log X. 


® Series which, are declining may show a decreasing amount of change. The de- 
creasing amount of change may represent a decreasing or constant (but usually 
decreasing) ratio of change. To avoid possible confusion, most of the discussion 
concerning asymptotic growth curves will deal with increasing series. 
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This curve type should not be used unless there is a logical justification 
for considering the logarithms of time. 

(3) A parabolic curve to log Y, which is written log lA = may 
be fitted by least squares by writing it log log Yc = log a + 6 log X. 

Note that, when using the logarithm of X, the X origin cannot bo taken 
at the middle of the period. 

The modified exponential. This curve not only describes a trend in 
which the amount of growth declines by a constant percentage, but the 
curve also approaches an upper limit, called the asyinpiote. This is an 


TABLE 13.4 

Hypothetical Bata for Modified Exponential Curve 
(Asymptote k == 114) 


X 

(1) 

7 

(2) 

Partial 

totals 

(3) 

Y 

increment 1 

(4) 

Per cent of 
preceding 
increment 

(5) 

0 

50 




1 

66 1 

116.0000 

16 • 


2 

78 


12 

75 

3 

87 

165.0000 

9 

75 

4 

93.75 


6.75 

75 

5 

98.8125 

192.5625 

5.0625 

76 


important property of growth curves, since many time series seem to 
approach an upper limit. The equation of the modified exponential is 

Yc-=k + 

where k is the asymptote. 

As noted in footnote 5, we shall give our attention primarily to increas- 
ing series, but Chart 13.7 shows four shapes which this equation may 
assume. It must be clear that our interest centers on part 1 of Chart 
13.7, since that is the only one of the four which represents an increasing 
series with an upper asymptote. There are occasions when one might 
wish to use a trend like that in part 3 of Chart 13.7. This would be 
true for a declining series tending to have a constant percentage of 
decrease in the amount of decrease. Death rates from a specific disease 
may behave in this fashion. 

The reader may find it illuminating to substitute various values for /c, 
a, and b in the equation for the modified exponential and to draw for 
himself curves like those shown in Chart 13.7. This will provide him 
with specific illustrations of the situations stated generally in that chart. 
Note that negative values of b are of no interest to us. 
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The first two columns of Table 13.4 show a series which has a constant 
percentage decrease in its amount of growth. As can be seen in Columns 
4 and 5^ each first difference is 75 per cent of the preceding first difference. 
The increments of increase are Ai, A2, A3, A4, and A5, and 


A2 A3 A4 A5 
Ai A2 A3 A4 


= 0.75. 


Referring to Chart 13.8, the horizontal broken line near the top of the 
chart is the value k that the curve of this series approaches; in this case k 



one. 



(2) When a is negative and h is greater 
than one. 



(3) When a is positive and h is less than 
one. 



(4) When a is positive and b is greater 
than one, 


Chart 13.7. Four Forms of the Modified Exponentiai Curve, Y<. * h 4- 


is 114. This means that, if we should extend the trend line indefinitely, 
it would approach closer and closer to this value, but never quite equal it. 
The second constant, a, the value obtained by subtracting the asymptote 
k from the trend value when X is zero, in this instance is *-64. The 
third constant, h, is, of course, the ratio between successive increments of 
growth, or 0.75 for this series. In Chart 13.8 the vertical broken line 
whenX = 1 is -*64(0.75) = — 48;whenX « 2, it is —64(0.75)^ — —36; 
and so on for the other values of X. Thus these vertical broken lines are 
described by the expression a5^. This is true when X = 0 also, since 
—64(0.75)® — —64. In the diagram, ah^ is represented by the height of 
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Cliart 13*8* A Modified Exponential Equation Fitted to the Data 

of Table 13.4. 


the shaded area. If now, in turn, we subtract from k the value of each 
of the vertical broken lines, we have the trend values. The vertical 

broken lines are subtracted from k because the sign of o is negative. 
Thus: 


X 

k + 


0 

114 - 84 

=» 50 

1 

114 - 48 

« m 

2 

114 - 36 

« 78 

3 

114 - 27 

« 87 

4 

114 -- 20.25 

« 93.75 

5 

114 - 15.1875 

= 98.8125 


Since the agn of a is negative, the increments of growth are declining. 
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As is already obvious, for this series of data the equation is = 114 — 
64(0.75)^. 

This curve has three constants: k, the asymptote; a, the distance 
between the value of Yc when Z == 0 and the asymptote; and b, the ratio 
between successive first differences. Three equations are therefore 
required for fitting it. These are obtained by first dividing the data into 
three equal sections, as in Table 13.4. Then the Y values are totaled 
for each section, as in Column 3. The results are: 

For the first third SiF = 116. 

For the second third S 2 F = 165. 

For the third third S 3 F = 192.5625. 

Let us note what 116 represents in terms of our equation. It is the sum 
of 50 4- 66 . But 50 is i -b ab” and 66 is A: -b a 6 *; so 

116 = 2 * -b o -b ab. 

This is Equation 1. The other two are obtained in similar fashion. The 
three equations are: 

I. 116 = 2fc -b o -b ab. 

II. 165 = 2fc -b a 6 » -b abK 

III. 192.5625 = 2fc + ab* -b (ib\ 

In order to solve for b, we first subtract Equation I from Equation II, 
obtaining Equation A; and then subtract Equation II from Equation III, 
obtaining Equation B. Thus: 

A. 49 = ab* -b ah* — ab — a 

= a(b* + b*-b-l). 

B. 27.5625 = ab* + ab* - ab* - ab* 

= ab*ib* + b* - 5-1). 

The constant b is now obtained by dividing Equation B by Equation A. 
We shall call the resulting equation 0. 

27.5625 ^ ab*{b* + b* - b - 1) 

49 “ ai¥ + b*-b-l)' 

¥ = 0.5625. 
b = 0.75. 

The value of a may now be gotten by substituting in Equation A or B. 

A. 49 = 0(0.75^ -b 0.75* - 0.75 - 1). 

49 

-0.765625 


a = 


64. 
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The remaining constant k may be computed by substituting the values 
of a and h in any of the original equations. 

1. 116 - 2/o - 64 - 64(0.75). 

2k - 228. 
k - 114. 

The values of the constants are thus found to be those which we knew to 
be correct. The equation was not obtained by the method of least 
squares, but was so fitted that the three partial totals of the trend values 
were the same as those of the original data. In this case, since the original 
data conform to the equation type perfectly, the fitted curve passes 
through ail of the original points. 

The logical procedure, which has been explained, can be developed into 
more convenient formulas, which are as follows:® 


_ SsF - 2.F 
SaF - 2iF 



where n is the number of years in each third of the data. Solving by 
these formulas requires, of course, that h be obtained first, then a, and 
finally k. 

If the expressions for a and 6 are substituted in the expression just 
given for fc, we obtain 

L SiY -f- — 2S2Y ^ 

which enables us to obtain the asymptote without first computing a and 5. 

Since time series do not often behave in such a manner that a modified 
exponential is a logical fit or a good description of the series, no illustration 
is given of the fit of 7c = + ab^ to a set of actual data. As noted 

earlier, the treatment of the modified exponential curve is intended as an 
introduction to the two other growth curves to be discussed in the follow- 
ing pages. 

TRe Gompertz curve. In the form which is of primary concern to 
us, the Gompertz curve describes a trend in which the growth increments 



* The derivation of these formnjas is given in Appendix S, section 13.1. 



Chap. 131 


NON-LINEAR TRENDS 


303 


of the logarithms are declining by a constant percentage. Thus, the 
natural values of the trend would show a declining ratio of increase, but 
the ratio does not decrease by either a constant amount or a constant per- 



(i) When log a is negative and b is less (2) When log a is negative and b is 
than one. greater than one. 



(3) When log a is positive and h is less (4) W'hen log a is positive and b is 
than one. greater than one. 

Chart 13.9. Four Forms of the Gompertz Curve, Yc = The vertical 

values at the points marked (*) are antilog (log k log a). 

('entage. The equation for the Gompertz curve is 

which ma}?' be put in logarithmic form 

log Ye — log fc + (log a) 

The four parts of Chart 13.9 show four shapes which the Gompertz 
equation may assume. While the statistician might occasionally find 
use for the Gompertz curve to describe trends of the types shown^ in 
parts 2 and 3 of Chart 13.9, our major interest centers in the form shown 
ill part 1 of the chart. This curve (and also the curve in part 2) has an 

- - - # 

^ Deaths of railway employees, accidents in factories, specific death rates, and other 
declining series might be described by a Gompertz curve having a lower asymptote at 
the right. Whether there is or is not an upper asymptote will depend upon the 
behavior of the data to which the curve is fitted. 
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upper and a lower asymptote, the lower asymptote being ^ero. Only 
positive values of b are considered in Chart 13.9, since negative values of 
b do not yield useful curves. 

Whatever has been said about the behavior of the modified exponential 
curve applies also to the logarithmic form of the Gompertz curve. The 
Gompertz curves shown in Chart 13.9 would, if put in logarithmic form 
(or plotted on semi-logarithmic paper), look like the corresponding parts 
of Chart 13.7. The fitting of the Gompertz curve is to the logarithms of 
the observed data and may be accomplished in a manner® exactly paral- 
leling the fit of the modified exponential. The expressions are 

_ S 3 log F — S 2 log Y 
" S 2 log F - Si log Y 


log a = (S2 log F - Si log F) 


1 

r wr, /b^ - ] 

A 1 

log = - 
n 

[SxlogF-(^_l 

•I log 0 


If it is desired to obtain the value of k without first computing log a and 
&, use 

~ (Si logF)(Salog F) - (Salog F)^ ] 

L Sx log F + S3 log F - 2S2 log F J’ 

Using this expression first enables one quickly to ascertain if the upward 
trend has an upper asymptote; computing k in this manner also pro- 
vides a check of the value of the k obtained by the formula first given. 
Whether or not there is an upper asymptote for an increasing series may 
also be ascertained by noting if (S3 log F — Sa log F) is greater than or 
less than (S2 log F — Si log F). If the first difference exceeds the 
second difference, 6" (and, therefore, b) is greater than one, and there is 
no upper asymptote for the increasing series; the curve of such an increas- 
ing series would resemble that shown in part 4 of Chart 13.9. If the first 
difference is less than the second, b is less than one, and the curve of an 
increasing series would look like part 1 of Chart 13.9. 

The data of Table 13.5, which are shown also in Charts 13.10 and 13.11, 
will serve as the basis for an illustration of the fit of the Gompertz curve. 
The computation of the required sums of the logarithms is carried out in 
the fourth column of .Table 13.5. Using the expressions previously 

* A number of Gompertz curves, fitted by a method different from that described in 
this text, may be seen in Grsmth PaOerm in Industry, National Industrial Conference 
Board, New York, 1952, 


log k = 


n 
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Chart 13,11. Domestic Coosttmption of Rayon Filament Yarn, 1912-1952, and Trend as Shown hy a Gompertis Curve 
Fitted la Bata for 1920-1952, Note that this chart has a logarithmic vertical scale. The Gompertz curve has been extended to show 
the general shape of the curve. Data for 1920-1952 from Table 13.5; data for 1912-1919 from the source given below that table. 
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TABLE 13.5 

Computation of Values for Fit of Gomperts Curve to Domestic Consump^ 
tion of Rayon Filament Yarn, 1920-1952 


(Millions of pounds) 


Year 

X 

Con- 

sump- 

tion 

Y 

Log Y 

Computation of trend values 


(Log a) 

Log 

== log 4- 

(log a) 

Yc 

1920 

0 

8.7 

0.939519 

1.0000000 

-2.215732 

1 .222791 

16.7 

1921 

1 

19 8 

1.296665 

0.9523195 

-2.110085 

1,328438 

21.3 

1922 

2 

24.7 

1.392697 

0 9069124 

-2.009475 

1.429048 

26.9- 

1923 

3 

32.5 

1.511883 

0.8636704 

-1.913662 

1.624861 

33.5 

1924 

4 

42 2 

1.625312 

0.8224902 

-1.822418 

1.616105 

41.3 

1925 

5 

58.2 

1.764923 

0.7832735 

-1.735524 

1.702999 

50.6 

1926 

6 

60.6 

1.782473 

0.7459266 

-1.652774 

1.785749 

61.1 

1927 

7 

100.0 

2.000000 

0.7103604 

-1.573969 

1.864554 

73.2 

1928 

8 

100.1 

2.000434 

0.6764901 

-1.498921 

1.939602 

87.0 

1929 

9 

131.5 

2.118926 

0.6442347 

-1.427452 

2.011071 

102.6 

1930 

10 

117.9 

2.071514 

0,6135173 

-1.359390 

2.079133 

120.0 

Si logy 



18.504346 



18.604351V 


1931 

11 

157.3 

2.196729 

0.5842645 

-1.294574 

2.143949 

139.3 

1932 

12 

152.0 

2.181844 

0.5564065 

-1.232848 

2.205675 

160.6 

1933 

13 

211.8 

2.325926 

0.5298768 

-1.174065 

2.264458 

183.8 

1934 

14 

194.8 

2.289589 

0.5046120 

-1.118085 

2.320438 

209.1 

1935 

15 

; 252 7 

2.402605 

0.4805518 

-1.064774 

2.373749 

236.5 

1936 

16 

297.6 

2.473633 

0.4576388 

-1.014005 

2.424518 

265.8 

1937 ! 

17 

; 267.1 

2.426674 

0.4358184 

-0.965657 

2.472866 

297.1 

1938 

18 

274.1 

2.437909 

0.4150384 

-0.919614 

2.518909 

330.3 

1939 

19 

359.8 

2.556061 

0 3952492 

-0.875766 

2.562757 | 

365.4 

1940 

20 

388.7 i 

2.589615 

0.3764035 

-0.834009 

2.604514 

402.8 

1941 

21 

452.4 

2.055523 

0.3584564 

-0.794243 

2.644280 

440.8 

SjlogF 



26.536108 



26.536113V 


1942 

22 

468 8 

2.670988 

0.3413650 

-0.756373 

2.682150 

481.0 

1943 

23 

494.2 

2.693903 

0.3250885 

-0.720309 

2.718214 

522,7 

1944 

24 

539 1 

2.731669 

0.3095881 

-0.685964 

2.752559 

565.7 

1945 

25 

602.4 

2.779885 

0.2948268 

-0.653257 ! 

2.785266 

609.9 

1946 

26 

666.5 

2.823800 

0.2807693 

-0.622110 

2.816413 

656.3 

1947 

27 

729 3 

2.862906 

0.2673821 

-0.592447 

2.846076 

701.6 

1948 

28 

846.7 

2 927730 

0.2546332 

-0.664199 

2.874324 

748.7 

1949 

29 

782.7 

2.893595 

0.2424922 

-0.537298 

2.901225 

796.6 

1950 

30 

955.5 

2.980231 

0 2309301 

-0.511679 

2 926844 

845.0 

1951 

31 

865.4 

2,037217 

0.2199192 

-0.487282 

2.951241 

893.8 

1952 

32 

845.0 

2 926857 

0 2094333 

-0.464018 

2 974475 

942.9 

Sa log y 



31 228781 


1 ... 

31 228787V 

. . . 


Data from Textile Economics Bureau, Inc,, Textile Onganon, Voi. XXIV, No. 2, February 1953, 

p. 20. 
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given^ we obtain 


= 

Log fe” = 
Log b = 


S 3 log F - S 2 log Y 


Ss log Y - 
31.228781 


Si log F 
- 26.536108 


4.692G73 


26.536108 - 
9.76660946 
9.97878268 
0.95231950. 


18.504346 8.031762 

- 10 = 109.76660946 - 110, 

- 10. 


= 0.58426445. 


Leg a = (S 3 log F - Si log Y) 


(b” ~ ly 


8.031762 

(-0.41573555)2 
(8.031762) (-0.27587127) 


8.031762 


-0.04768050 


0.17283605 
•2.2157324. 


Log k = ~ 
n 


Si log F - (yzrr) 
_ 

\-0.C 


18.504346 


_ i. 

~ 11 

= 3.438523. 


!.4157355 .5\ 

•.04768050/ 


(-2.2157324) , 


Check, using 


Log k 


1 

n 


~ (Si log F)(S 3 log F) - (S 2 log F) 2 l 
. Si log F + S 3 log F - 2 S 2 log F J’ 


11 


(18.504346)(31.228781) - (26.536108)2 ' 
.18.504346 4-31.228781 - 2(26.536108). 


Trend equation: 


3.438522. 


Log Y, = 3.438522 - 2.2157324(0.9523195)^. 

Ya = 2,744.9(0.00608509) 

Origin, 1920. X units, 1 year. 

The natural form of the trend equation is obtained by looking up the 
anti-logarithms of log k and log o. Since log a - —2.2157324 is a nega- 
tive logarithnf, it must be rewritten log a - 7.7842676 - 10 before the 
value of a = 0.00608509 can be obtained from Appendix R. Note that 
6 = 0.9523195, which indicates that the ratio of increase each year is 
declining; more specifically, that each difference between successive 
logarithmic trend values is about 0.95 times (or 95 per cent of) the pre- 
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ceding difference. Whenever 5 < 1, the of & — 1 is negative, 

resulting in a negative value for log a, if So log Y exceeds Si log F. 
(See the equation for log a.) If log a is negative, a is less than one. 

For our data, when X is zero (the value of X for 1920), F* = 1.0 and 
= 0.00608509, with the result that for 1920 F. - (2,745) (0.00608509) 
= 16.7, the value shown for 1920 in the last column of Table 13.5, The 
greater the value of X, the smaller the value of b^. As X increases, 
approaches zero and approaches 1.0, with the result that 1C approaches 
A, or 2,745, the upper asymptote. 

The procedure for computing the trend values is shown in Table 13 . 0 . 
Note that Si log Yc == Si log F, S2 log F, - S. log F, and S3 log Ff 
= S3 log Y to at least six digits. These agreements’-^ are rioted by check 
marks in the column headed ^^Log YX'' The trend values have been 
plotted on Charts 13.10 and 13.11 and have been extended in both direc- 
tions to indicate more clearly the shape of the fitted curve. The exten- 
sion of the trend to 1996 is not intended as a forecast, although the 
Gompertz curve is sometimes used to assist in making predictions. The 
asymptote is shown on both of the charts, and the approach of the trend 
to the asymptote is apparent. 

In Chart 13.10 it will be noticed that the amount of growth is small at 
first, then becomes larger until it reaches a point of inflection, after which 
it declines and finally approaches, but never reaches, zero. This general 
shape of the trend is common to many industries and has led Prescott^® 
to the conclusion that it describes a law of growth According to 
Prescott, this trend is a function of population growth, the curve of 
which typically is similar in appearance, but it is also partly due to the 
development of the individual industry. He believes that the growth of 
an industry may be divided into four stages: 

(1) Period of experimentation, 

(2) Period of growth into the social fabric, 

(3) Through the point where growth increases but at a diminishing rate, 

(4) Period of stability. 

These stages are not very specifically demarcated by Prescott, who also 
claims for this type of curve that it is useful in forecasting the future of an 

® The values of log b and of b were obtained from a more extensive table of loga- 
rithms than the one given in Appendix E in order that these equalities might be close. 
Use of Appendix R, together with arithmetic interpolation, for log b and for h yields 
the same Ye. values as in Table 13.5, but the agreement of the partial sums of the 
logarithms is not so exact. 

*^Law of Growth in Forecasting Demand,” by Raymond D. Prescott. Jmmml 
qfthe American Stututical Associahon, VoL XVIII, December 1922, pp. 471-479. 
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industry, since it not only is a logical curve but, on account of its tendency 
to flatten out, tends to be conservative in its forecasts. The horizontal 
dashed lines of Charts 13.10 and 13.11 would seem to indicate that the 
upper limit of rayon filament yarn consumption in the United States 
would be about 2,745 million pounds. While this does not appear, from 
the charts, to be an unreasonable figure, it may be too low if additional 
uses for rayon are found or it may be too high if other synthetic fibers 
supplant rayon. 

The logistic curve. This curve, which is also known as the Pearl- 
Reed curve, is, in its simplest form, 

~ = fc + ab^. 
y c 

From this expression it should be clear that it is merely a modified expo- 
nential in terms of the reciprocals of the Y values; the first differences 
of the reciprocals of the Fc values are declining by a constant percentage. 
A modified exponential could therefore be fitted, by the method of partial 
totals, to the reciprocals of the observed Y values, and the reciprocals of 
the fitted values so obtained taken as the trend values. However, this 
curve is more often written 

Y = ^ 

' 1 + 

and, although the procedure is more subjective, fitted by the method of 
selected points. In this form, the logistic curve will always have an 
upper asymptote of k and a lower asymptote of zero; it looks like part 1 

or part 2 of Chart 13.9. In the form — = fc + ab^j the logistic could 

I c 

assume forms similar to all four of those shown in Chart 13.9. 

To fit the equation 

F j; 

' 1 + 

by the method of selected points requires choosing three years, iro, and 
Usually e 2.7X828 is used, instead of 10, m the denominator, giving 

Y _ fc 

The & values and the h values in the two forms will diSer, but both forms describe 
the same curve, and the values are slight V easier to compute from the expression 
usmg 10 in the denominator. 
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X 2 , equidistant from each other: one near the beginning of the period, one 
in the middle, and one near the end. The three selected values through 
which the fitted curve will pass are the Y values associated with these 
three years. These Y values are designated |/o, |/n The origin 

on the X axis is at the year designated xo, and n is the number of years 
from xq to Xi or from Xi to X 2 . The three constants are obtained as 
follows 


Ic ^ 

a = 

b = 


2yoyiy2 - ylivQ + y^) 


log 

1 

n 


ym - vi 

k — yQ ^ 

yo 

yoik — yi) 


log 


yiik ~ t/o)J 


As an illustration, Table 13.6 shows the procedure for fitting a logistic 
curve to data of the population of Continental United States for 1810- 
1950. The population data are shown graphically in Chart 13.12. This 
period, including 15 decennial figures, was used instead of the entire 
period 1790-1950 in order that comparison could be made with the 
method of partial sums of reciprocals, mentioned previouslyd^ In Table 
13.6, the three selected points are 


2 / 0 , the geometric mean of the values for 1810, 1820, and 1830; 

2 / 1 , the geometric mean of the values for 1870, 1880, and 1890; and 

2 / 2 , the geometric mean of the values for 1930, 1940, and 1950. 


Consequently, xq is at 1820, Xi at 1880, and X 2 at 1940, as showm in the 
second column of Table 13.6. Averages of three decennial figures were 
used in order to minimize the effect of a single unusually high or low 
value; the geometric mean was used in preference to the arithmetic mean, 
since the population growth is more nearly a geometric progression than 
an arithmetic progression. The value of n is 6, the number of years fi am 


For the mathematical reasoning behind this type of curve, see EaymoiKl Pearl, 
Studies in Human Biology, Williams and Wilkins Company’', Baltimore, 1924, Chapter 
XXI¥. 

For 1810-1950 the method of partial sums yields h - 185.9 millions. The fit 
in Table 13.0 shows k 190.3 for the method of selected points for 1810-1950. The 
method of selected points for 1790-1950 (using the geometric means of the first three, 
middle three, and last three years as those points) gives I; = 189,9 millions. Several 
other methods of fitting a logistic curve are given in K, R. Nair, ^^The Fitting of 
Growth Curves/^ in Oscar Kempthome, et aL, editors, Statistics and Malkermim in 
Biology, The lo-wa State College Press, Ames, Iowa, 1954, pp. 119-132. 
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Chart 13.12. Population of Continental United States, 1810-1950, and 
Trend as Shown by a Logistic Curve. The logistic curve has been extended to 
show the general shape of the curve. Data of Table 13.6. 



TABLE 13.6 

Computation of Values for Fit of Logistic Curve to Data of Population of 
Continental United States^ 1810-1950 


Year 

(1) 

z 

(2) 

X 

13> 

Popu- 
lation in 
nulUons 

Y 

(4) 

y 

(5» 

Computation of trend values 

0.1381596X 

(6) 

Log ju 

« 1.274670“ 
0.1381596X 
(7) 

g 

(8) 

1 

(9) 

„ 190.293 

Yc “ 

1 +h 

(10) 

1810 


-1 

7.2 


-0. 1381 60 

1.412830 

25.87 

26.87 

7.1 

1820 

aro 

0 

9.6 

9 6(i/o) 

0 

1.274670 

18.82 

19.82 

9 

1830 


} 

12 9 


0.138160 

1.136510 

13.69 

14.69 

13.0 

1840 


2 

17.1 


0.276319 

0.998351 

9.962 

10.962 

17.4 

1850 


3 

23.2 


0.414479 

0.860191 

7.248 

8,248 

23.1 

1860 : 


4 

31 4 


0.652638 

0.722032 

5.273 

6.273 

30.3 

1870 


6 

39 8 


0.690798 

0.583872 

3.836 ^ 

4.836 

39.3 

1880 

a-i 

6 

50.2 

50* 2{yi) 

0.828958 

0.446712 

2.791 

3.791 : 

50. Zs/ 

1890 i 


7 

62 9 


0.967117 

0.307553 

2.030 ! 

3.030 ’ 

62,8 

1900 


8 

76.0 


1.105277 

0.169393 

1.477 ! 

2.477 

76.8 

1910 


9 

92.0 


1.243436 

0.031234 

1.075 

2.075 

91.7 

1920 


10 

105 7 


1.381596 

-*0.106926 

0.7818 

1.7818 

106.8 

1930 

. . , 

11 

m.8 

. . . 

1*519756 

-0.245086 

0.5687 

1.6687 

121.3 

1940 

xa 

12 

131.7 

134, 6{s.'*) 

1.657916 

-0.383245 

0.4138 

1.4138 

134, 6n/ 

1950 


13 

160.7 


1.796076 

-0,521405 

0,3010 

1.3010 

146.3 


Data from V. B. Bureau of the Census, U, S. Cmaus of FopyXati&n: 1960, VoL X, Humber of Inhabi- 
tant®, p. 1-3, Table 2. The revised population figure is shown above for 1870. The y valuer of Column 
5 are geometric means of three values oentered at »o, and zt. The negative logarithms in Column 7 
must be rewritten in their alternative forms with negative characteristic and positive mantissa (e.g. 
—0,101^926 *• 9*893074-10) before the values of a can be obtained. 
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Cliart 13.13, Fopwlation of Continental United States, 1810-1950, and 
Trend as Shown by a Logistic Curve, The logistic curve has been extended to 
show the general shape of the curve. Note that this chart has a reciprocal vertical 
scale and that, owing to the compression of the upper part of the scale, the curve of 
the observed data and the trend line virtually coincide. Data of Table 13.6. 

313 





314 


ANALYSIS OF TIME SERIES 


[Chap. 13 


Xo to Xi or from Xi to Xi. Using the yo, yi, and values shown in Table 
13.6, we obtain the values of k, a, and b as follows: 


2 / 02/2 - y\ 

_ 2(9.6) (50.2) (134.6) - (50.2)^(9.6 + 134.6) 
(9.6) (134.6) - (50.2)2 

= 190.293. 


a = log 


k — yo 

, 

yo 


190.293 - 9.6 

log — = log 18.822188. 

y.D 


= 1.274670. 

yaik - yi) 


log 


2/i(fc - 2/o) 


9.6(190.293 - 50.2) 
50.2(190.293 - 9.6) 


1 


log 0.14826636, 


= i (9.17104244 - 10) = 7 (-0.82895756), 
•0 6 

= --0.1381596. 


Trend equation : 

190.293 

^ 1 + 3;Q(i.27467<)“0.138l596A’’) 

Origin, 1820; X units, 10 years. 


The computation of the trend values for this logistic equation is shown 
in the last five columns of Table 13.6. The procedure consists first of 
writing 


so that 


In our equation, 


IX = io«+^^ 


F. - 


h 

1 + /A 
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and 

log/x = (log 10)(1.274670 - 0.1381596X), 

- 1.0(1.274670 0.1381596Z), 

= 1.274670 - 0.1381596X. 

The values of /x are obtained in Columns 6, 7, and 8 of Table 13.6. In 
Column 9 of this table, the values of 1 + /x are shown, and the Yc values 
are gotten in Column 10. A check on the computations may be had by 
comparing the Yc values for 1820, 1880. and 1940 with the values of yo, yi, 
and since the curve must pass through the three selected points. The 
check marks in Column 10 of Table 13,6 indicate that agreement is 
present. 

The trend values have been plotted in Charts 13.12 and 13.13, and the 
trend has been extended in both directions to show more clearly the 
fundamental shape of the curve. Note that the agreement between the 
observed data and the trend is so close that the two can hardly be dis- 
tinguished. Note, too, that Chart 13.13 uses a reciprocal vertical scale, 
and that in this chart the logistic curve is similar in appearance to the 
modified exponential curve. 

The logistic curve was mentioned in 1838, and later discussed more 
fully, by P. F. Verhulst. In 1920 it was developed independently by 
Raymond Pearl and Lowell J. Reed. It is not infrequently referred to as 
the Pearl-Reed eur\^e. Pearl and Reed have used the curve to describe 
the growth of an albino rat and of a tadpole^s tail, the number of yeast 
cells in a nutritive solution, the number of fruit flies in a bottle (on a 
limited food supply), and, most interesting of all, the number of human 
beings in a geographical area. In each case, the phenomenon measured 
is population growth, either the number of cells in an organism or the 
number of individuals in a region. The law of growth which the logistic 
curve describes is stated by Pearl as follows 

In a spatially limited universe the amount of increase which occurs in 
any particular unit of time, at any point of the single cycle of growth, is 
proportional to two things, viz, : (a) the absolute size already attained at 
the beginning of the unit interval under consideration, and (b) the 
amount still unused or unexpended in the given universe (ox area) of 
actual and potential resources for the support of growth. 

In the case of human populations, a new development may expand the 
available subsistence and allow a new cycle of growth. For instance, 
mankind may pass through a hunting stage, an agricultural stage, and an 

Raymond Pearl, The Biology of Population Growth^ Alfred A. Knopf, New York, 
1925, p. 22. See also Raymond Pearl, IrUroduciion to Medical Biometry and Staiidics^ 
W. B. Saunders Company, Philadelphia and London, 1940, Third Edition, p. 459 f. 
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iiidustriai stage. Each cultural epoch may then be described by a new 
logistic curve spliced onto the old one. Thus^ 

fca 

Yc - /Cl + | 

describes a curve in which ki is the new lowe^ limit and ki + fca the new 
upper limit. In this equation, ki is below the upper limit fco of the 
previous logistic and indicates the value at which the previous one was 
interrupted. 

Apparently waves of immigration and human insututions do not 
change the fundamental shape ot the curve, although they may modify 
the steepness of its slope somewhat. Also, the growth may not be sym- 
metrical: the point of inflection need not be halfway between the upper 
and the lower asymptotes, nor need the two parts of the curve be of the 
same shape. A skewed logistic may be obtained by a slight modification 
of the previous formulae, by writing 

Y ^ 

1 + lOo-fftJr-j-cx* 

The theory advanced by Raymond Pearl is not, however, universally 
accepted. Some argue that, although the logistic curve is appropriate 
enough for fruit flies in a bottle, its extension to human society is unwar- 
ranted. Human beings have, and exercise, the power of modifying their 
environment and rationally controlling the^r rate of reproduction. 

One use to which the logistic curve is sometimes put is to forecast the 
size of the future population. Forecasts based merely upon the extension 
of a curve are of dubious value, since they assume no important changes 
in any of the underlying influences on a series. The extended trend 
value of our logistic curve for 1960 is 156.1 million, which is almost cer- 
tainly too low. A trend such as we have fitted may also be used to esti- 
mate population for earlier years, when reliable records did not exist. 
Thus, the population of what is now the continental United States may be 
estimated from our equation to have been about 2.8 million in 1780. A 
better estimate for 1780 might have been obtained if we had included 
1790 and 1800 when determining the constants for the logistic equation. 

Comparison of the Gompertz and logistic curves. The Gompertz 
and logistic curves are similar in that they both can be used to describe an 
increasing series which is increasing by a decreasing percentage of growth, 
or a decreasing series which is decreasing by a decreasing percentage of 
decline. They differ in that the Gompertz curve involves a constant 
ratio of successive first differences of the log Yc values, while the logistic 


Bee footnote 5 in Cliapter 5. 
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curve entails a constant ratio of successive first differences of tiie — 
values. 

For the types of series to which we are interested in applying these 
curves, both have upper and lower asymptotes. 

The first differences of the trend values of a Gompertz curve form a 
curve resembling a skewed frequency distribution, as shown in part A of 
Chart 13.14, The first differences of the trend values of a logistic curve, 
of the type discussed here, form a curve resembling a normal frequency 

MILLIONS 
OF POUNDS 



Chari 13.14A. First Differences of the Gompertz Trend Values of Domestic 
Consumption of Rayon Filament Yarn, 1915“2053. 
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OF PERSONS 



1760 1800 1840 1880 1920 i960 2000 2040 2080 

Chart 13^14B. First Differences of the Logistic Crenel Values for Population 
o£*CoiitineiitaI United States, 1770-2070* 
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distribution (see Chapter 23), as shown in part B of Chart 13.14. Because 
of this characteristic of the logistic curve, observed data are sometimes 
plotted on arithmetic probability paper (see Chart 23.9 and the accom- 
panying discussion) to see if the trend appears to be a straight line. If 
so, the logistic curve may be fitted. 

When plotted on semi-logarithmic paper, the Gompertz curve has the 
appearance of a modified exponential curve; when plotted on a grid with 
a reciprocal vertical scale and an arithmetic horizontal scale (alterna' 


tively, -p^^and X may be plotted on arithmetic paper), the logistic curve 

/ e " 

has the appearance of a modified exponential curve. 


SELECTING A TREND TYPE 

This, and the preceding chapter, have not attempted an exhaustive 
treatment of the types of trends that may be utilized. However, a suffi- 
cient variety has been given to meet most of the needs for time series 
analysis. With such a large number of trend types available, how can 
one decide which to use? First, the trend type should be compatible 
with the behavior of the forces which we seek to measure. If the object 
is solely to obtain cyclical deviations, the trend should pass through the 
approximate center of each cycle. If it is desired to extend the trend for 
purposes of forecasting, the trend and its extension should conform to 
expectations dictated by logic. If, for instance, the series is such that it 
may logically be expected to flatten out,, an asymptotic curve should be 
selected. When the objective is solely historical study, the future behav- 
ior of the curve is not so important. 

The first step in deciding what trend type to use should always consist 
of plotting the observed data on arithmetic paper and then, if the trend 
is not linear but either (1) upward and concave upward or (2) downward 
and concave upward, on semi-logarithmic paper. Examination of the 
plotted data will frequently provide an adequate basis for deciding upon 
the type of trend to use. When further guidance is needed, an approxi- 
mate trend may be drawn by inspection and the following tests applied to 
the smoothed curve: 


1. If the first differences tend to be constant, use a straight line. 

2. If the second differences tend to be constant, use a second-degree 
curve. 

3. If the first differences tend to decrease by a constant percentage, use 
a modified exponential. 


involves: (1) assuming an asymptote and (2) expressing the observed data ^ 
as percentages of the asymptote, before plotting. More than one asymptote may be * 
tried. 
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4. If the approximate trend, when plotted on arithmetic paper, is a 
straight line, use a straight line. 

5. If the approximate trend, when plotted on semi-logarithmic paper, 
is a straight line, use an exponential curve. 

6. If the approximate trend, when plotted on semi-logarithmic paper, 
resembles a modified exponential, use a Gompertz curve. 

7. If the approximate trend, when plotted on a grid with a reciprocal 
vertical scale and an arithmetic horizontal scale, resembles a modified 

exponential, use a logistic curve. Alternatively, and X may be plotted 

Jt c 

on an arithmetic grid. 

8. If the first diff erenc es resemble a skewed frequency curve, use a Gom- 
pertz curve, or a more complex logistic curve than the one described here. 

9. If the first differences resemble a normal frequency curve, use a 
logistic curve. 

10. If the first differences of the logarithms are constant, use an expo- 
nential curve. 

11. If the second differences of the logarithms are constant, hi a second- 
degree curve to the logarithms. 

12. If the first differences of the logarithms are changing by a constant 
percentage, use a Gompertz curve. 

13. If the first differences of the reciprocals are changing by a constant 
percentage, use a logistic curve. 

14. If the approximate trend values (or the original data), when 
expressed as percentages of a selected asymptote, appear linear on arith- 
metic probability paper, use a logistic curve. 

Series are sometimes encountered which appear to have had a trend of 
one type during one part of the period and a different trend of the same, 
or a different, type during another part of the period. Changes in trend 
are most to have occurred during the 1930^s. 

Rarely, several trends, each having the same number of constants, 
appear equally suitable for a series of data. In such an event, that one 
is to be preferred from which the squared deviations of the F values are a 
minimum. In making such a comparison, curves fitted to Y values 
should not be compared with those fitted to log Y values. 

Occasionally, none of the previously mentioned aids will enable one 
to decide what trend type to use. This may be because the approximate 
trend was not properly selected. Or, it may be that the series does not 
conform to any simple mathematical description. In a dynamic world, 
the forces in operation are seldom allowed to vrork out their full effects 
before other factors make themselves felt. As a result, any trend type 
may be appropriate for only a relatively shdlrt period. 



CHAPTER 14 


Analysis of Time Series: 

PERIODIC MOVEMENTS I— CONSTANT 
SEASONAL PATTERNS 


As indicated in Chapter 11, there are many types of periodic move- 
ments, including those that repeat themselves daily, weekly, monthly, or 
annually. In this chapter most attention will be given to those monthly 
movements within a year commonly known as seasonal movements. 
The principles laid down can easily be applied to the various other types 
of periodic movements. It will be the plan of this discussion to start 
with data which lend themselves to very simple treatment, and gradually 
to introduce more complex methods as they are required. Consideration 
of seasonal movements that vary in their pattern from year to year will, 
however, be reserved for the following chapter. In general, all of the 
methods involve averaging, in some manner, the values of the different 
Januaries, then the values of the different Februaries, and so forth, but 
differ chiefly in the degree to which the data are refined before being 
averaged, 

AN INTRODUCTORY ILLUSTRATION 

Averages of unadj listed data. When the data do not contain cyclical 
movements or trend to any appreciable extent, it will suffice to average 
the data without making any previous adjustment. An illustration of 
such data is the number of books issued and renewed for home use at the 
main loan desk of the Columbia University Libraries during the 1952- 
1953 winter semester. The data are shown in Table 14.1, from which 
were excluded those weeks in which a holiday occurred and also the weeks 
before final examinations, the week before the Christmas vacation, and 
the week before the November 4, 1952 presidential Election Day holiday. 
Below each column of data is given the average of that column. The 
averages, one for each day of the week, constitute a measure of the intra- 
week fluctuation in circulation of books. For convenience, however, it 
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may be desirable to express this measure in percentage form. By 
dividing each of the six daily averages by the average of those six averages 
(which is the average per day for the entire period), and expressing each 
of the six daily averages as a percentage, we obtain the index shown in the 
last row of Table 14.1. 


TABLE 14.1 

Computation of Index of Intra-Week Variation^ Using Averages of Un- 
adjusted Bata, of the Number of Books Issued and Renewed for Home 
Use at the Main Loan Desk of the Columbia University Libraries^ 
Winter Semester^ 1932-1953 


Week 

Mon- 

Tues- 

Wednes- 

Thurs- i 

Fri- 

Satur- 

Average 

beginning; 

day 

day 

day 

day 

day 

day 

per day 

Sept. 29 

541 

533 

661 

487 

513 

364 

499.8 

Oct. 6 

674 

559 

624 

590 

632 

300 

529.8 

Oot. 13 

710 

475 

641 

597 1 

566 

337 

554.3 

Oct. 20 

659 

484 

540 

543 

500 

376 

617.0 

Nov. 10 

,576 

496 

646 

655 i 

586 

363 

536.8 

Nov. 17 

720 

592 

603 

626 

561 

533 

605.8 

Dec. 1 

666 

539 

548 

504 j 

546 

464 

544.3 

Dec. 8 ' 

701 

601 

550 

635 

759 

422 

611.3 

Jan. 5 

792 1 

565 

548 

561 

617 

486 

593.2 

Arithmetic mean 

671.0 

538.2 

562.2 

576.4 

575.4 

405.0 

564.7 

Index j 

121.0 1 

97.0 

101.4 

103.9 

103.7 

73.0 

100.0 


Data from Circulation ^Department* Columbia University Libraries. Excluded are those weeks in 
which a holiday occurred and also the week before final examinations, the week before the Christmas 
vacation, and the week before the Nov. 4, 1962 presidential Election Day holiday. 


Percentages of simple averages, A glance at the data of average 
circulation per day for the nine weeks, shown in the last column of Table 
14.1, makes it clear that activity was greater in some weeks than in 
others. The procedure which was followed in Table 14,1 allowed the 
weeks of larger circulation to exert more weight on the daily averages, 
and thus on the index, than that exerted by the weeks of smaller circula- 
tion, It might be thought offhand that such extra weight is highly 
desirable, but.it must be remembered that we are trying to determine a 
typical pattern, and it does not necessarily follow that weeks of large 
circulation are weeks having a typical pattern. If the figures for each 
day of a given week are expressed as percentages of the average for that 
week, as in Table 14.2, each week will be of equal importance in deter- 
mining the index of intra-week variation. Furthermore, by putting the 
data into percentage form, we can more readily detect erratic variations 
from the typical weekly pattern. A study of such percentage data for 
each day may lead one to select some average other than the arithmetic 
mean. Thus, in the present instance, the percentage data of Table 14.2 
have been put into arrays in Table 14.3 and in Chart 14.1. It is clear, 
from Chart 14.1, that a periodic movement is present. It is clear, too, 




MON. TUES. WED. THUR. FRI. SAT. 


Chart 14.1. Arrays of Percentages of Daily Averages for Each Week 
for Number of Books Issued and Renewed for Home Use at the Main 
Loan Desk of the Columbia University Libraries, Winter Semester, 
1952-1953. Data of Table 14.3. 

PER CENT 



Chart 14.2. Indexes of Intra-Week Variation of Number of Books 
Issued and Renewed fdr Home Use at the Main , Loan Desk of the 
Columbia University Libraries, Winter Semester, JL952-195B. Data 
from Tables 144 and 14.3. 
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TABLE 14.2 

Percentages* «>f Daily Averages for Each Week for Number of Books Issued 
and Renewed for Home Use at the Main Loan Desk of the Columbia 
University Libraries^ Winter Semester^ 1952-'195S 


(The daily averages for each week are shown in the last column of Table 14.1.) 


Week 

beginning: 

Monday- 

Tuesday 

Wednes- 

day 

Thursday 

Friday 

Saturday 

Sept. 29 

108.2 

106.6 

112,2 

97.4 

102.6 

72.8 

Oct, 6 

127.2 

105". 5 

98.9 

114.4 

100.4 

56,6 

Oct. 1.3 

128.1 

85.7 

115.6 

107.7 

102.1 

60.8 

Oct. 20 

127.5 

93.6 

104.4 

105.0 

96.7 

72.7 

Nov. 10 

107.3 

92.4 

101.5 

122.0 

109.2 

67.6 

Nov. 17 

1 118.8 

97.‘7 

99.5 

103.3 

92.6 

88.0 

Dec. 1 

122.4 

99.0 

100.7 

92.6 

100.1 

1 85.2 

Dec. 8 

114.7 

98.3 

90.0 

103.9 

124.2 

69.0 

Jan. 5 

133.5 

95.3 

92.4 

92.9 

104.0 

1 81.9 


* Each row averages 100.0. 
Based on data of Table 14.1. 


TABLE 14.3 

Computation of Index of Intra-Week Variation^ Using Percentages of the 
Daily Average for Each Weeky of the Number of Boohs, Issued and Re- 
newed for Home Use at the Main Loan Desk of the Columbia Uni- 
versity Libraries, Winter Semester, 19B2-195B 


Hank 

Mon- 

Tues- 

Wednes- 

Thurs- 

Fri- 

Satur- 

Aver- 

day 

day 

day 

day 

day 

day 

age 

1 

133.5 

106.6 

115.6 

122.0 

124.2 

88.0 

... 

2 

128.1 

105.5 

112.2 

111.4 

109.2 

85.2 


3 

127.5 

99.0 

104.4 

107.7 

104.0 

81.9 


4 

127.2 

98,3 

101.5 

105.0 

102.6 

72.8 


5 

122.4 

97.7 

100.7 

103.9 

102.1 

72.7 


6 

118.8 

95.3 

99.5 

103.3 

1 100.4 

69-0 


7 

114.7 

93.6 

98.9 

1 97.4 

i 100.1 

67.6 


8 

108.2 

92.4 

92.4 

: 92.9 

96.7 

60.8 


9 

107.3 

85.7 

90.0 

92.6 

92.6 

56 6 


Mean of middle seven 

121.0 

97.4 

101,4 

103.1 

102.2 

72.9 

99.7 

Index 

121.4 

97 7 

101.7 

1 103.4 

102.5 , 

73.1 

100,0 


Data of Table 14.2. 


that there are a few extreme values which do not fit into the general 
pattern. The effect of such extremes can be greatly decreased by using 
the median for each day; or, the extreme values can be eliminated by 
using the arithmetic mean of a central group of values for each day. In 
Table 14.3 the average of the middle 7 values for each day is shown. ^ 


^ If the reader will eompute an index using the median for each day, or the mean of 
the middle five values for each day, he wfil find that the six values will differ only 
slightly from those shown in Table 14.3. 
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Since these six figures are modified means, they do not average exactly 
100.0 Instead, they average 99.7 and are adjusted to average 100.0 by 
dividing each of them by 99.7 and multiplying by 100 to obtain the index 
shown in the last row of Table 14.3. The indexes of Tables 14.1 and 14.3 
are shown in Chart 14.2. They do not differ greatly, because the nine 
weeks are not greatly different in importance. 

SEASONAL INDEXES OF MONTHLY DATA 

A seasonal index, showing the typical intra-year movement of a series, 
is ordinarily based upon monthly data, but such an index may be con- 
structed from weekly^ data. While a seasonal index could be made from 
daily data, the index would be likely to reflect intra-month and intra- 
week movements as well as seasonal variations. In this text we shall 
limit our attention to seasonal indexes obtained from monthly data. 

Before setting out to compute a seasonal index, one should be sure that 
a seasonal movement is present in the series. This may be apparent from 
experience with the subject matter represented by the data. In the case 
of the book-circulation data of Table 14.1, the librarians knew that intra- 
week variations were present, so no preliminary examination of the data 
was necessary. Similarly, the reader knows that seasonal variations 
exist in the consumption of ice cream, the use of gasoline, department 
store sales, and in various other series. However, the investigator may 
not always know if the series in which he is interested has a seasonal, and, 
unless he assures himself that a seasonal movement is present, it is con- 
ceivable that he might perform the extensive calculations to be described 
later and learn at the very end of his work that his index figures were all 
approximately 100.0. 

To ascertain if a seasonal is present in a series, it will usually suffice to 
draw a curve of the data such as the lighter line of Chart 14.4 or to make 
a chart like Chart 14.5. In some instances, it may not be possible to be 
sure there is a seasonal movement by examining charts of the raw data 
and it may be necessary to proceed far enough with the analysis to make 
charts like Charts 14.1 and 14.7. Occasionally charts such as Chart 15.2 
must be constructed before a decision can be made. 

A seasonal index based on percentages of trend. If a series of 
monthly data exhibits secular trend, a seasonal index computed by either 
of the simple methods previously described will have an upward or down- 
ward bias, depending on the direction of the trend. Thus, if the trend 
were upward and linear, each December would be higher than the pre- 
ceding January by an amount equal to ji of the annual growth, even if 
there were no genuine seasonal movement present. Because of this fact, 


* The procedure is described on pages 528-538 of the first edition of this text. 
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the seasonal index^ which is supposed to exhibit seasonal movements onljs 
would slope upward; and, if there were a true seasonal movement, the 
December index number would be too high relative to the January index 
number by 11 of the annual growth. Of course, the trend may not be 
upward and linear. It may be downward and linear, in which case the 

P«rctnt 



Chart 14.3. Index of Typical Seasonal Variation in Life Insurance Death- 
Benefit Payments in the United States. From the Division of Statistics and 
Research of the Institute of Life Insurance. The index represents averages of the 
ratios of actual payments to trend values, the trend having been htted to monthly 
data for 1942 through 1951. 

December figure would be too low. If the trend is non-linear, its effect 
on a seasonal index computed as in Table 14.1 or Table 14.3 cannot be so 
simply stated, but the effect is present and is often pronounced. 

The first really useful procedure for computing a seasonal index was 
designed to overcome this difficulty and was based on per-cent-of-trend 
data. In this method,® the first step consists of determining a trend 
equation for the data and obtaining the monthly trend values. Next, the 
original monthly data are expressed as percentages of the monthly trend 
values. These percentages are put into a table like Table 14,3 but having 
12 columns, one for each month. The seasonal index is then obtained 
from twelve monthly medians or modified means just as in the last two 
rows of Table 14.3. 


^ It is sometimes referred to as the Falkner methods See ^'The Measurement of 
Seasonal Variation,” by;^ Helen D. Falkner, Journal of the American StatisUcal Amocia* 
tio% June 1924, pp. 167-179. 
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The per-cent-of-trend method ignores the disturbing effect of cyclical 
ups and downs. The highs and lows of cycles would appear as extreme 
dots in a chart like Chart 14.1 but which would have twelve arrays 
instead of six. This method depends upon the averaging process, that is, 
upon the use of the median or a modified mean, to eliminate the effect of 
cyclical highs and lows. At present, it is not a widely used method, but 
it may be applied to series having cyclical movements which are unim- 
portant relative to the seasonal movements. Such a series is the payment 

THOUSANDS OF 
SHORT TONS 



Chart 14.4. Consumption of Newsprint by United States Publishers, 
January 1943- June 1953, and Centered Twelve-Month Moving Average, 
Data of Table 14.5. 


of life insurance death benefits in the United States, and Chart 14.3 shows 
the seasonal index for this series computed by the ratio-to-trend method. 

Percentages of centered 12-month moving averages. The data 
which we shall use to illustrate the determination of a seasonal index, 
which does not change from year to year, have to do with the consumption 
of newsprint by United States publishers. Charts 14.4 and 14.5 make it 
clear that a seasonal movement is present and that it is approximately 
the same from year to year. Chart 14.5 may be termed a ‘‘year-over- 
year” chart, since each year is arbitrarily placed above the preceding 
year; the curve for each year has been plotted to the same vertical scale, 
but at a different level. 

The data of newsprint consumption have not been adjusted for calendar 
variation. The reason for not making this adjustment is that the pub- 
lished data are not so adjusted. If a seasonal index were to be made from 
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A B 

Chart 14.5, Year-Over- Year Charts of; (A) Consump- 
tion of Newsprint and (B) Percentages of Twelve-Month 
Moving Average, 1944-1952. Data of Table 14.5. In each 
part of the chart, the curve for each year is placed just above the 
curve for the preceding year. This is accomplished by using the 
same vertical scale for each of the nine curves, but raising or 
lowering the scale, as necessary. 

the data adjusted for calendar days, then all monthly figures, including 
new ones as they appear j would have to be adjusted before they could be 
compared to the typical seasonal movement. Users of such data are, not 
infrequently, more interested in the monthly figures than in the per-day 
figures, the length of a month being sometimes thought of as contributing 
its part toward the typical seasonal variation. The procedure for com- 
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puting an index of seasonal variation is the same whether the data have or 
have not been adjusted for calendar variation. One adjustment, how- 
everj has been made: February 1944, February 1948, and February 1952, 
each of which had 29 days, were adjusted to a 28-day basis. 

The percentage-of-12-month-moving-average method, which is ordi- 
narily referred to merely as the per-cent-of-moving-average method (or 
just moving-average method) is in wide current use. It differs from the 
per-cent-of-trend method only in that the original data are expressed as 
percentages of the moving average instead of as percentages of trend. 
•Computing the centered 12-month moving average involves more work 
than does the determination of trend values, but the resulting seasonal 
index is a better one. This is so because the moving average is a fairly 
good estimate of trend and cyclical movements combined. 

A 12-month moving average is a series of averages which embraces, 
first, the first 12 months of a series; next, the second to thirteenth months; 
then the third to fourteenth months; and so on. To be more specific, let 
us consider the data of newsprint consumption by United States pub- 
lishers, shown in Table 14.4. The first figure for the 12-month moving 
average is the average of the first 12 months, January 1943-~December 
1943. In Column 4 of the table this is seen to be 226.68. Note that, 
being the average of the 12-month period January-December 1943, this 
figure is centered between June and July 1943. The second moving- 
average figure, 224.02, covers the period February 1943~January 1944 
and is centered between July and August 1943. Each figure in Column 4 
of Table 14.4 is the arithmetic mean of the six original figures which pre- 
cede it and the six original figures which follow it. 

Since the figures in Column 4 of Table 14.4 fall between each pair of 
months, while the original data in Column 2 are for calendar months and 
are centered at the middle of each month, it is necessary to adjust the 
moving averages so that they will be in step with the original data. This 
process is called centering'^ and involves computing a two-month moving 
average of the 12-month moving averages. Columns 5 and 6 of Table 
14.4 show how this is done. The result is a series of moving averages, 
properly centered and beginning with July 1943. These moving averages 
have been plotted in Chart 14.4. 

Borne statisticians do not bother to center a 12-month moving average, but 
arbitrarily place the average for each 12 months opposite the seventh month, con- 
tending that the loss in accuracy is more than offset by the saving in time. If a 
centered i2-month moving average is computed by the method described on the 
following pages and illustrated in Table 14.5, and if a mask is used to obtain the 
moving totals (see F, E, Croxton, Worhbook in Applied General Staiislice, Prentiee- 
Hali, Inc., New York, .1950, third edition, p, 95), the centered 12-month moving 
average can be obtained almost .as quickly as can the uncentered 12-month moving 
j*,verage. 
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TABLE 14.4 

Computation of Centered 12*month Moving Average for Consumption of 
Newsprint by United States Publishers^ January 194S-June 1953 


Year and 
month 

1 

Consumption 
{thousands 
of short tons) 

12-month 

moving 

total 

12-moiith 

moving 

average 

1 


1943 

January . . . 
February . . 
March .... 

April 

May 

June 

July 

August 

September . 
October , . . 
November. 
December . 

1944 

January . . . 
February . . 
March .... 

April 

May 

June 

July 

August .... 
September. 
October . . . 
November. 
December . 


2.720.2 

2 . 688.2 

2,656.3 

2,620.9 

2.578.7 

2.527.8 
2,490.5 


2.453.1 

2,418.4 
2,385 3 

2,367.9 

2.357.2 

2.344.8 

2.335.3 

2.334.2 

2.335.3 

2.337.4 

2.345.8 

2,345.2 


226.68 

224.02 
221 .36 

218.41 
214.89 

210.65 
207.54 


204.42 
201.53 

198.78 
197.32 

196.43 

195.40 

194.61 
194.52 

194.61 

194.78 
195.48 

195.43 


2-month 

moving 

total 


450.70 

445.38 

439.77 

433.30 

425.64 

418.19 


411.96 

405.95 

400.31 

396.10 

393.75 

391,83 

390.01 

389.13 

389.13 

389.39 

390.26 

390.91 


Centered 12- 
month moving 
average 
Col. 6-1-2 


January. . 
February. 
March . . . 

April 

May 

June 

July — . , 
August. . . 
September 
October . , 
November 
December , 

1953 

January . . 
February . 
March . . . 

April 

May 

June ..... 


4,513.9 

4.510.2 

4.507.3 
4,505.6 

4.526.3 

4,540.5 

4.539.3 

4.545.8 

4.555.2 

4.576.9 

4.592.3 

4,617.8 

4,619.1 


376.16 

375.85 

375,61 

375.47 

377.19 

378,38 

378.28 

378.82 
379.60 

381.41 
382.69 

384.82 
384.92 


752.01 
751.46 

751.08 

762.66 
755.57 

756.66 

757.10 

758.42 

761.01 

764.10 
767.51 
769.74 


Data from U. S. Department of Commerce, Business Statistics, 1953 Biennial Edition, p. 179; 1951 
Blenaia! Edition, p. 178; and I Statistical Supplement to the Survey of Current Business, p. 160, 
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It is clear from Chart 14.4 that the centered moving-average figures do 
not reflect, to any appreciable degree', either the seasonal movement or 
irregular movements. It is not so clear, from Chart 14.4, that the moving 
average follows, approximately, the combined trend and cyclical pattern, 
since there is little cyclical movement in the series of newsprint consump- 
tion during the period under consideration. That a centered 12-month 
moving average does, indeed, describe the approximate trend and 
cyclical movements^ may be observed more satisfactorily in Chart 15. L 

Before proceeding with the computation of the seasonal index for news- 
print consumption, it will be well to look again at Table 14.4 and to note 
•that the procedures indicated in that table are more laborious than neces- 
sary. We do not need to compute the moving average of Column 4. 
We could, instead, compute a two-month moving total of the figures in 
Column 3 and then divide each of these totals by 24 to obtain exactly the 
same figures as are shown in Column 6 of Table 14.4. There is, however, 
an even more expeditious procedure, which we shall employ. Consider 
the centered moving average for July 1943. This figure was obtained 
by totaling the value for January 1943, twice the value for February 1943, 
twice the value for each of the following months through December 1943, 
and the value for January 1944, and dividing this total by 24. Similarly, 
the average for August 1943 is the result of dividing by 24 the sum of : the 
February 1943 value, twice each of the next 11 values, and the value for 
February 1944. In other words, what we have actually done in com- 
puting a centered 12-month moving average is to compute a 13-month 
moving average with the months weighted 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1. 

Table 14.5 shows the computation of the weighted 13-month moving 
total and of the 12-month centered moving average. The procedure is 
as follows: 

1. Using an adding machine, compute the weighted 13~month moving 
total for July of each year and also the last moving total, which in Table 
14.5 is for December 1952. The total for each July will include values 

®When a series shows pronounced cyclical movements, the centered 12-month 
moving average may not move high enough into the cyclical peaks or low enough into 
the cyclical lows. It should be clear why this is so, since, when a centered 12-month 
moving average is centered at a cyclical high point, the average would be influenced 
not only by the value for the middle month, but also by the six preceding and the six 
following months, all or most of which would have values lower than that of the middle 
month. The reverse would be true when the moving average is centered at a cyclical 
low point. Because of the foregoing, some statisticians smooth and alter the moving- 
average curve, usually by a freehand process, to obtain what is believed to be a better 
estimate of the combined trend and cyclical movements. The original values are 
then expressed as percentages of the values on this new curve. See, for example, 
"'Adjustment for Seasonal Variation,” by E. C. Barton, Federal Eeserm Bulletin^ 
June 1941, pp. 51$'~528. 
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TABLE 14.5 

Short Method of Computing Centered 12^month Moving Average and Per- 
centages of Moving Average for Consumption of Newsprint by United 
States Publishers, January 1943-June 1953 


Yea i a id moatBi 

(1) 

Consumption 
(thousands of 
short tons) 

(2) 

13-month moving 
total weighted 

1, 2, 2, • * • , 2, 2, 1 

(3) 

Centered 12- 
month 
moving 
average 

Col 3-5-24 
(4) 

Per cent of 
12-month 
moving 
average 

Col. 2 -5- Col. 4 
(5) 

1943 





January 

226.7 

* • . 

• • • 


February 

208.1 

• * » 


‘ . > * 

March. 

237.1 

• • * 

. • • 


April 

May 

243.3 

« » * 

• . . 

. « « 

248.3 

• • • 


« * « 

June 

228.4 




July 

212.3 

5,mAs/ 

225.4 

94.2 

August 

217,1 

5,344.5 

222.7 

97.5 

September 

222.7 

6,277.2 

219.9 

101.3 

October 

235.5 

5,199.6 

216.6 

108.7 

November 

222.3 

5,106.5 

212.8 

104.5 

December 

218.4 

5,018.3 

209.1 

104.4 

1944 




January 

194.7 

4,943.6 

206.0 

94.5 

February 

176.2 

4,871.5 

203.0 

86.8 

March 

201.7 

4,803.7 

200.2 

100.7 

April 

May 

201.1 

4,753.2 

198.0 

101.6 

197.4 

4,725.1 

196.9 

100.3 

June 

191.1 

4,702.0 

195.9 

97.5 

July 

174.9 

4,680. iV 

195.0 

89.7 

August 

182.4 

4,669.5 

194.6 

93.7 

September 

189.6 

4,669.5 

194.6 

97.4 

October 

218.1 

4,672.7 

194.7 

112.0 

November 

211.6 

' 4,683.2 

195.1 

108.5 

December 

206.0 

1 4,691.0 

195.5 

105.4 

1945 




January 

185.2 

4,693.4 

i 195.6 

94.7 

February 

175.1 

J 4,716.9 

196.5 

89.1 

March 

202,8 

i 4,761.1 

1 198.4 

102.2 

April 

May 

203.2 

4,803.6 

200.2 

101.5 

205.8 

4,846.9 

1 202.0 

101.9 

June 

190.5 

4,890.8 

I 203.8 

93.5 

July 

177.9 

4,946.1%/ 

i 206.1 

' 86.3 

August ... 

202.9 

5,030.1 

209.6 

96.8 

September 

213.3 

5,143.1 

214.3 

99.5 

October 

236.9 

5,263.8 

219.3 

i 108.0 

November 

236.1 

5,375.3 

224.0 

^ 105.4 

December 

225.4 

5,499.8 

229.2 

98.3 

1946 




January 

221.1 

5,633,8 

234.7 

94.2 

February 

223,2 

5,753.4 

239.7 

93.1 

March 

267.7 

5,860.1 

244.2 

109.6 

April 

May 

259.0 

5,967.7 

248.7 

104.1 

261.5 

6,078.4 

253.3 

103.2 

June 

259.3 

6,203.2 

258.5 

100,3 

July 

243.1 

6,3i7.9v/ 

263.2 

92.4 

August ’ 

257.3 

6,398.4 

266.6 

96.5 

September 

265.6 

6,468.6 

269.5 

98.6 

October 

292.2 

6,542.1 

272.6 

107.2 

November 

291.5 

6,622.1 

276,9 

106.7 

December 

294.8 

1 6,697.0 

279.0 

105.7 



332 


ANALYSIS OF TIME SERIES 


[Chap. 14 


TABLE 14.5 (Continued) 


Year and month 

(1) 

Consumption 
(thousands of 
short tons) 

(2) 

13-month moving 
total weighted 

1. 2. 2. • • • , 2, 2, 1 

(3) 

Centered 12- 
month 
moving 
average 
Coi. 3 4- 24 
(4) 

Per cent of 
12-month 
moving 
average 

Col. 2 *4* Col. 4 
(6) 

1947 





January 

266.4 

6,751.0 

281.3 

94.7 

February 

258.4 

6,795.4 

283.1 

91.3 

March 

302.7 

6,853.4 

285.6 

106.0 

April 

May 

297.5 

6,934.7 

288.9 

103.0 

303.0 

7,028.3 

292.8 

103.5 

June 

292.7 

7,102.1 

295.9 

98.9 

July 

263.7 

7,155.5s/ 

298.1 

88.5 

August 

281.1 

7,220.6 

300.9 

93.4 

September 

299.8 

7,295.2 

304.0 

98.6 

October 1 

339.3 

7,375.9 1 

307.3 

110.4 

November 

338.0 

7,468.8 ‘ 

311.1 

108.6 

December 

322.1 

7.547.0 

314.5 

102.4 

1948 





January 

292.5 

7,609.3 

317.1 

92.2 

February 

297.4 

7,670.1 

319.6 

93.1 

March 

338.3 

7,740.4 

322.5 

104.9 

April 

342.6 

7,820.2 

325.8 

105.2 

May 

348.8 

7,888.9 

328.7 

106.1 

June 

327.1 

7,956.8 

331.5 

98.7 

July 

291.6 

8,038. 6n/ 

334.9 

87.1 

August 

314.0 

8,090.2 

337.1 

93.1 

September 

337:2 

8,130.2 

8,185.1 

338.8 

99.5 

111.9 

October 

381.7 

341.0 

November 

364.3 

8,254.8 

344.0 

105.9 

December 

363.7 

8,321.0 

346.7 

I 104.9 

1949 




January 

332.7 

8,365.3 

1 348.6 

95.4 

February 

308.8 

8,390.8 

! 349.6 

88.3 

March 

366.9 

8,414.1 

1 350,6 

104.6 

April 

May 

368.9 

8,451.0 

352.1 

104.8 

392.2 

8,482.9 

353.5 

110.9 

June 

349.9 

8,506.0 

354.4 

98.7 

July 

313.1 

8,627.2s/ 

365.3 

88,1 

August 

318.0 

8,564.0 

356.8 

89.1 

September 

356.5 

8,618.4 

359.1 

99.3 

October 

399.3 

8,683.3 

361.8 

110.4 

November 

378.6 

8,727.9 

363.7 

104.1 

December 

372.5 

8,764.2 

365,2 

102.0 

1950 




January 

345.1 

8,814.6 

367.3 

94.0 

February 

333.2 

8,867.0 

369.5 

90.2 

March. 

396.9 

8,913.1 

371,4 

106.9 

April 

May. 

403.8 

8,951.9 

373,0 

108.3 

401.9 

9,002.7 

376.1 

107.1 

June — 

376.5 

9,057.8 

377.4 

99.8 

July 

336.8 

9,084.1s/ 

378.5 

89.0 

August.. 

346.8 

9,088.0 

378.7 

91.6 

September 

373.8 

9,088.9 

378.7 

98.7 

October. 

420.8 

9,093.3 

378.9 

111.1 

November 

407.9 

9,101.5 

379.2 

107.6 

December 

398.3 

9,091.6 

378.8 

105.1 
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TABLE 14.S (Concluded) 


Year and laontii 

(1) 

Consumption 
(thousands of 
short tons) 

(2) 

13-month moving 
total weighted 

1, 2, 2. • • • , 2, 2, 1 

(3) 

Centered 12- 
month 
moving 
average 
Coi. 3 4- 24 
(4) 

Per cent of 
12-month 
moving 
average 
Col. 2 -4 Col 4 
(5) 

1951 

January 

345.6 

336.6 

9,077.0 

378.2 

378.0 

91.4 

February 

9,071.3 

9,076.6 

89.0 

March . 

394.4 

378.2 

104.3 

April 

410.7 

9,068.7 

9,048.1 

377.9 

108.7 

May 

403.2 

377.0 

106.9 

June 

365.3 

9,032.5 

376.4 

97.1 

July 

333.4 

9;021.7'/ 

9,021.4 

9,026.3 

375.9 

88.7 - 

August * • . • 

344.6 

375.9 

91.6 

September. 

381.4 

376.1 

101,4 

October 

405.3 

OjOH.O 

375.6 

107.9 

November 

402.8 

8,997.7 

9,013.2 

9,024.1 

9,017.5 

9,012.9 

374.9 

107.4 

December 

387.8 

375.6 

103.2 

1952 

January 

345.3 

376.0 

91.8 

February 

336.6 

375.7 

89.6 

March 

399.3 

375.5 

106.3 

April 

393.6 

9,031.9 

9,066,8 

376.3 

104.6 

May. 

404.1 

377.8 

107.0 

June 

379.9 

9,079.8 

9,085.1V' 

378.3 

100.4 

July 

329.7 

378.5 

87.1 

August 

341.6 

9,101.0 

379.2 

90.1 

September 

379.7 

9,132.1 

380.5 

99.8 

October 

426.0 

9,169.2 

382.0 

111.5 

November. 

417.0 

9,210.1 

383.8 

108.7 

December 

386.6 

9,236.9V 

384.9 

100.4 

1953 

January 

351.8 

February 

346.0 




March 

421.0 




April 

408.9 




May. 

429.6 




June 

381.2 





Data from U. S. Department of Commeree, Buatnesa Statidics, 1953 Biennial Edition* p* 179; 1951 
Biennial Edition, p. 17S; and 11947 Statistical Supplement to the Survey of Current Bueines&t p.^ 160. 


from the preceding January to the following January, inclusive. The 
total for December 1952 will include values from June 1952 through June 
1953. These values are entered in Column 3 of Table 14.5 and serve as 
check values for the moving totals to be obtained in step 2. 

2. Using an adding machine® which will subtract, enter the weighted 
moving total figure for July 1943. Subtract the values for January and 
February 1943, add the values for January and February 1944, and suh~ 


® If an adding machine with a subtraction bar is not available, a calculating machine 
may be used. It is possible to subtract on an adding machine which has no sub- 
traction bar by adding the complement of a number (for example, the complement of 
276 would be entered as 99999724 on an eight-column adding machine). However, 
adding complements is not recommended for use in step 2, as the operator is likely to 
make numerotis mistakes. 
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total This subtotal is the weighted moving total for August 1943. 

Next subtract the values for February and March 1943, 
add the values for February and March 1944, and 
subtotal This second subtotal is the value for Sep- 
tember 1943. Continue the process of subtracting two 
values, adding two values, and subtotaling, as shown in 
the accompanying reproduction of a portion of an 
adding-machine tape. When the subtotal is obtained 
for July 1944, it should agree with the figure already 
obtained. Agreement is indicated for all of the July 
figures, and for December 1952, by check marks in 
Column 3 of Table 14.5. 

3. Compute the centered moving average by divid- 
ing each figure in Column 3 of Table 14.5 by 24. This 
division may be accomplished most expeditiously by 
placing the reciprocal of 24 (which is 0.04166667) in 
the keyboard of a calculating machine and multiplying 
it by the values shown in Column 3 of Table 14.5. 
The machine need not be cleared between multiplica- 
tions, since it is merely necessary to increase or de- 
crease the multiplier to obtain the next product. If a 
calculating machine having automatic multiplication 
is being used, it will probably be preferable to clear 
out the result of each multiplication before proceeding 
to the next one; 0.04166667 should be retained in the 
machine for all of the multiplications. The results are 
shown in Column 4 of Table 14.5. 

The next step in computing the seasonal index con- 
sists of expressing each original value as a percentage 
of the corresponding centered moving average. The 
results of this step are shown in Column 5 of Table 14.5 
and in Chart 14.6. The logic of the procedure is as 
follows: Time series are assumed to be composed 
oiTXCXSXl (Trend X Cycle X Seasonal X Ir- 
regular). The 12-month moving average is a rough 
estimate of T X C because the 12-month average 
smoothes out seasonal movements and, for the most 
part, irregular movements, since the latter are largely 
movements of small amplitude and short duration. 
If now we divide the original data by the 12-month 
moving average, we have an estimate of the seasonal 
and irregular movements combined: 


5,408.40 

226.70 - 
208.10 - 

194.70 
176.20 

5.344.50 S 
208.10 - 
237.10 - 

176.20 

201.70 

5.277.20 S 

237.10 - 
243.30 - 

201.70 

201.10 

5.199.60 S 

243.30 - 

248.30 - 
201.10 

197.40 

5.106.50 S 

248.30 - 

228.40 - 

197.40 

191.10 
5,013.30 S 

228.40 - 

212.30 - 
191.10 
174.90 

4.943.60 S 
212.30 - 

217.10 - 
174.90 
182.40 

4.871.50 S 

217.10 - 

222.70 - 
182.40 
189.60 

4,803.70 S 

222.70 - 
235.50 - 

189.60 

218.10 

4.753.20 S 
235.50 - 
222.30 - 
218.10 

211.60 

4.725.10 S 
222.30 - 
218.40 - 
211.60 
206.00 

4,702.00 S 
218.40 - 

194.70 ^ 
206.00 

185.20 

4.680.10 S 
194.70 « 

176.20 - 

185.20 
175.10 

4.669.50 8 
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TXCXSXI , 

— 

Chart 14.6 shows quite clearly the presence of the seasonal movement, 
which seems to be approximately the same from year to year. It is not 
exactly the same, since the spring peak is sometimes March, sometimes 
April, and sometimes May; also, the fall peak occurs in October, but 
occasionally November is almost as high. 

From this point on, the procedure parallels that used for the library- 
circulation data expressed in percentage terms. First, however, we make 
Table 14.6, which puts the per-cent-of-moving-average data into a form 



Chart 14.6. Percentages of Centered Twelve-Month Moving Average for 
Consumption of Newsprint by United States Publishers, 1944-1952. Data of 
Table 14.5 or 14.6. 


which facilitates the construction of the arrays, which are shown in Table 
14.7. Notice that only those years for Avhich 12 per-cent-of-moving- 
average figures were available are included in Tables 14.6 and 14.7. 

After making a table of the monthly arrays, a chart, such as Chart 14.7, 
should be constructed. A chart of the monthly arrays is often useful in 
helping one to decide what measure of central tendency to use in averag- 
ing the months; in addition, it gives a general indication of the seasonal 
pattern. 

There are two ways of deciding what items to eliminate. One way is 
to consider each array of Chart 14.7 separately and to eliminate items that 
appear to be unusually high or low, perhaps studying each large deviation 
individually and eliminating those for which a special circumstance can 
be discovered. If this method is follow'ed, one array might use an 
average of all items; another might employ the median; a third, the 
central five items; a fourth, all items except the two highest; and so on. 
On account of the extreme subjectivity of the method, it is dangerous 
unless the statistician possesses a high order of knowledge and judgment. 
An alternative method, which is probably more frequently used, consists 
of computing the same type of modified jnean for each month. No 
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generally applicable rule can be set up for the selection of the appropriate 
modified mean, but the exclusion of the one highest value and one lowest 
value or the two highest and the two lowest values will often be found to 
be satisfactory. The number of items to exclude depends partly on the 
number of cycles included in a series; the larger the number of cyclical 
highs and lows which are reflected in the percentages of moving average 
(because they have not been completely smoothed out by the moving 
average), the more extreme items which may need to be excluded. For 



Chart 14.7. Arrayed Percentages of Moving Average and Seasonal 
Index for Consumption of Newsprint by United States Publishers, 
1944-1952. Data of Table 14.7. The highest and lowest value in each 
array was excluded for purposes of computing the seasonal index. 


the newsprint consumption data of Table 14.7, we have used the mean of 
the middle seven values, with the results shown in the next-to-the-last 
row of the table. 

The 12 modified means average 99.8. When each modified mean is 
divided by 99.8 and multiplied by 100, we get the seasonal index^ shown 
in the last row of Table 14.7 and in Chart 14.7. Note that the 12 values 
of the seasonal index average 100.0. This is important, since seasonal 
variations will later be removed from the original data by dividing the 
original data by the seasonal index. If the seasonal index were to average 
less than 100.0, the adjusted figures would all be a little too large; if the 

^ A seasonal index based on the mean of the middle five items in Table 14.7 is so 
nearly the same that the curve could hardly be distinguished from that shown in 
Chart 14.7. The greatest difference for any one month is 0.3. Tbe same is true for 
an index based on monthly medians, except that one month (May) shows a difference 
o! 0.8. 
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seasonal index were to average more than 100.0, the adjusted figures 
would all be slightly too small. 

Link relatives. At one time the link-relative method was the most 
widely used method of obtaining a seasonal index. The computations 
involved are less extensive than those required by the moving-average 
method, but the link-relative method is less satisfactory than the 
moving-average method; in particular, it is not readily adaptable to the 
determination of changing seasonal movements, a topic treated in the 
following chapter. 

The first step in this method consists of expressing each monthly value 
as a percentage of the preceding monthly value. These are the link rela- 
tives. From this point on, the procedure® is the same as shown in Table 
14.7, except that the 12 monthly averages are generally found to contain 
some residual trend, which was not eliminated by computing the link 
relatives. Adjustment for this residual trend must be made before the 
seasonal index is obtained. 

ADEQUACY OF THE SEASONAL INDEX 

One test of a seasonal index is provided by the chart of the arrays, as 
shown in Chart 14.7. If the individual arrays are widely dispersed (that 
is, cover a wide range vertically), we can have little confidence in the 
seasonal index. The less the dispersion of the individual monthly 
arrays, the more uniform is the seasonal movement from year to year. 

It is possible to ascertain (by the method described in Chapter 24) 
whether a given modified mean differs significantly from 100. Or, using 
the method of analysis of variance (discussed in Chapter 26), to ascertain 
whether the 12 modified means as a group differ significantly from each 
other. However, these procedures are of dubious value, primarily because 
the distributions from which the means were computed were not random 
distributions, and also because the means were modified means, computed 
after part of the data had been rejected. 

A practical test of the adequacy of a seasonal index is to use it to 
eliminate the seasonal variation in the series, and then to observe whether 
any residual seasonal movements are present. We shall return to this 
point in Chapter 16. 

® The method is more fully described on pp. 486-492 of the first edition of this text. 
The advantages and disadvantages of the link-relative method are set forth there in 
more detail. 



CHAPTER 15 


Analysis of Time Series: 

PERIODIC MOVEMENTS II— CHANGING 
SEASONAL PATTERNS 


In Chapter 14 we considered procedures for determining seasonal 
indexes for series having patterns which underwent little or no change 
during the period with which we were concerned. Some time series have 
seasonal patterns which change. Changes may be progressive — which is 
to say that the seasonal pattern varies gradually from year to yeai' — or 
they may be of a more abrupt nature, reflecting, for example, changes 
in the date of Easter or the shifting date of some important event, such 
as the New York automobile show in the fall of 1935, which was men- 
tioned in Chapter 11. 

PROGRESSIVE CHANGES IN SEASONAL PATTERN 

A moving seasonal. Chart 16.1 shows monthly data of the linage 
of magazine advertising in the United States from July 1942 to June 1953. 
As will be clear later, this series has a progressive change in seasonal 
pattern: the pattern is not the same throughout the period with which we 
are concerned. This is often referred to as a moving seasonal. From a 
chart such as Chart 15.1, it is not always possible to ascertain whether 
the seasonal pattern is fixed or moving. To make this decision, it is 
usually necessary to proceed part way with the seasonal analysis (through 
step 2 of the procedure which follows); luckily, the initial steps are the 
same for the determination of either a constant or a moving seasonal. 

Computation of a moving seasonal index. A moving seasonal 
index may be obtained as follows: 

1. Compute a centered i2-month moving average of the original data. 
Since the procedure is exactly the same as shown in Columns 2, 3, and 4 
of Table 14.5 for the data of newsprint consumption, the computation 
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of the moving average is not shown here. Howeverj the moving average 
is shown graphically in Chart 15.1. 

2. Express the original data as percentages of the moving average 
These figures are shown in Table 15.1. 

3. Plot the data of Table 15.1 on 12 charts, one chart for each months as 
shown in the 12 parts of Chart 15.2. These 12 monthly charts may be 
drawn on separate sheets of graph paper or on one large sheet, as may be 
convenient. In any event, they should not be too small in view of the 
use which is to be made of them in the next two steps. 



(4hari 15.1. Magazine Advcrlming in the United Slates, July 1942-Juiic 
1953, and Twelvc-Monlh-Ceniered Moving Average, January 1943~DeceTOher 
1952. Data from variouK ihhuc's of the Survey of Current Business. Moving average 
computed as shown in Table 14.6. 


4. Reference to the January portion of Chart 15.2 shows that January 
has a downward trend. June, July. August, November, and December 
also have downward trends. Several months show upward trends, for 
example, March, April, September, and October. The monthly trends 
may be linear or non-linear. Also (although Chart 15.2 does not show 
a good example of this) a month may have a trend whic^h declines and then 
rises, or vice versa. The fourth step consists of determining a trend for 
each of the 12 monthly charts. This may be done by drawing freehand 
trend lines, by fitting mathematical curves, or by using a moving average 
(for example, a five-term moving average) as a guide and smoothing the 
moving average freehand. However the trend lines are obtained, 
they should be relatively simple curves and should not slope too steeply, 
up or down, at the ends. It must be realized that the trends we are con- 
cerned witli here are not affected by the same forces that are associated 
with secular trend. The monthly trends are very unlikely to continue in 
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To avoid obscuring details, those charts show no guide lines. 
When used to aid in the computation of a moving seasonal index, 
charts such as these would have finely ruled grids. The values 
in Table 15.2 are rea^ from the smooth curves. The values in 
Table 15.3 are the dots which are just above, just below, or on 
the smooth curves. . 
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a given direction indefinitely, but are more likely to move to a certain 
level and then remain more or less stable until new factors bring about a 
change from that level. For purposes of illustration, the 12 trend lines 
in Chart 15.2 were drawn freehand. If we wish to have a seasonal index 
for a year later than that shown in a chart such as Chart 15.2, in order to 
deseasonaliise the monthly data as they become available, we may use the 
seasonal index for the last year shown (as is done in Table 16.3) or we 
may extend the monthly trend lines. 

5. From the monthly charts of Chart 15.2, read the trend values and 
enter them in a table. These are first approximations of the moving^ 
seasonal and are shown in Table 15.2. 



1943 !944 i945 1946 1947 1948 1949 1950 1951 1952 

Chart 15.3. Moving Seasonal Index for Magazine Advertising in the United 
States, 1943-1952. Data from Talkie 15.3. 

6. It will be noticed that the 12 values for each year, shown in Table 
15.2, in no instance total 1,200.0. The final step consists of adjusting the 
first approximation figures of Table 15.2 so that each annual total will be 

1200.0, but at the same time retaining smooth, well-fitting trends for the 
12 parts of Chart 15.2. The results of this step are shown in Chart 15,2 
by means of dots and in Table 15.3, which gives the moving seasonal 
index. Note that the total for each year is now 1,200.0. If the 12 
monthly trend lines are linear, they may be fitted mathematically by a 
procedure^ which automatically results in the annual totals each being 

1200 . 0 . 

The moving seasonal pattern for magazine advertising is shown 
graphically in Chart 15.3* Note how the relative importance of March 
and April changes over the period; note also that the mid-year low, which 
was June in 1943, gradually shifts to July, Another interesting point 

^ See R* J. Foote and Karl A. Fox, Seasonal Variaiion: Methods of Meamremmt and 
Tests of Significanee} pp. 6^7, issued ^ptember 1952 by the Bureau of Agricultural 
Economics as Agricultural Handbook No, 48* 
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brought out clearly in Chart 15.3 is the gradual change in the amplitude 
of the seasonal variation over the period. 

The reader may have noted that steps 4 and 6 in the determination 
of a moving seasonal index may involve subjective considerations. This 
does not constitute a weakness in the procedure, but it does suggest that 
better results are more likely to be obtained by an experienced worker 
who is familiar with the series being studied than by one not so well 
equipped. The procedure for obtaining a moving seasonal index, which 
has been described in the preceding paragraphs, is occasionally modified 
^by using a 12-month moving average, not centered, but arbitrarily placed 
opposite the seventh (or sixth) month. 

If a series that contains a moving seasonal is deseasonalized by a con- 
stant seasonal index, the adjusted data will contain not only the irregular 
movements actually present in the series, but additional irregularities 
where the constant seasonal index has undercorrected or overcorrected. 
Unless one knows that the series with which he is working has a fixed 
seasonal movement, it is always wise to make the 12 monthly charts of 
Chart 15.2. These will reveal whether a moving seasonal is present; if 
the seasonal is constant, the trends will be horizontal lines. 

Footnote 5 of Chapter 14 pointed out that a 12-month moving average 
may not move high enough into cyclical peaks or low enough into cyclical 
troughs. Partly to correct for this characteristic of the moving average, 
the Division of Research and Statistics of the Board of Governors of the 
Federal Reserve System uses a more complex procedure^ than the one 
just illustrated. Here are the bare outlines of the Federal Reserve 
method: 

The main nonseasonal movements are determined as follows — 

1. Compute a 12-month moving average centered at the seventh 
month. 

2. Plot the original data and moving average on an arithmetic grid. 

3. Draw a freehand curve through the curve of the original data, 
wherever the moving-average curve seems to fail adequately to 
describe the main nonseasonal movements. 

4. Read and record the monthly values from the moving-average curve 
as modified. 

Typical differences between ike unadjmted values and the main nmh 
seasonal movements are next obtained — 

5. Express the original values as percentages of the values of the main 
nonseasonal series obtained in step 4. 

® For a full description, see ** Adjustment for Seasonal Variation,” by H. C. Barton, 
F&hrd Emme Bulkiin^ June 1941, pp. 518-528. 



Char 15] 


CHANGING SEASONAL PATTERNS 


351 


6. Make 12 monthly charts (one each for January, Pebraary, March, 
and so on) of the ratios obtained in step 5. 

7. Draw a freehand trend line for each monthly chart. This is termed 
'^averaging the ratios for each month.’’ 

8. Read the values from the freehand lines of step 7 and adjust the 12 
values for each calendar year so that they total 1200 or '^depart 
from this total by no more than an amount that can be accounted for 
by some special circumstp^nce affecting the series.” These are the 
preliminary seasonal indexes. 

9. Using the figures of step 8, compute a preliminary series adjusted for 
seasonal variation. 


The preliminary index is then revised — 

10. Plot the preliminary adjusted series on the chart of step 2. 

11. Repeat steps 3 through 10 for all locations where the original free- 
hand curve departs from the general movements of the preliminary 
adjusted series. This procedure results in revised preliminary sea- 
sonal indexes and a revised preliminary adjusted series. 

12. Plot the revised preliminary adjusted series on a year-over-year 
chart similar to Chart 14.5. 

13. Examine the year-over-year chart by reading it vertically to see 
whether there are months (or groups of months) showing recurring 
movements of a seasonal nature. 

14. Make a final revision of the seasonal values of step 1 1 (modifying the 
curves of step 7) to eliminate, as far as possible, all recurring move- 
ments, shown in the year-over-year chart, that seem to be seasonal in 
nature. The 12 values for each calendar year should ordinarily con- 
tinue to total 1200. A final cheek of the deseasonalized data may be 
made on a year-over-year chart. 

It must be clear that the Federal Reserve procedure differs in two 
respects from the method used in this text: first, the moving average 
(which is not centered) is modified by a freehand curve; and second, the 
seasonal index first obtained (step 8) is twice revised. This method 
requires knowledge of the field represented by the data and a high order 
of judgment. In the words of the article mentioned in footnote 2, it 
requires a higher grade of work and somewhat more time than most 
mechanical methods/’ For the less erraitc series, it was found that 
determining and eliminating seasonal for data covering a 14-year period 
required about a half-day’s work of a professional nature and two days of 
clerical work. The author of the article adds: ^Time spent in this way, 
however, yields more accurate seasonal adjustments than can be obtained 
by applying an inflexible mathematical process, and in addition yields a 
knowledge of other characteristics of the underlying series that is valuable 
on its own account” 
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SUDDEN VARIATIONS IN SEASONAL PATTERNS 

Seasonal patterns may change abruptly, rather than gradually, and 
then the device of a moving seasonal would be inapplicable. Such 
changes may involve merely the relative importance of two consecutive 
months, or may involve a change in the entire pattern. The most fre- 
quently encountered change of the first type is that occasioned by the 
varying data of Easter. 

Adjustment for Easter.^ A number of statistical series are affected 
materially by changes in the date of Easter, which may range from 
'March 22 to April 25. Retail sales and money in circulation are two of 
the series so affected. Department store sales, in particular, show the 
effects of the customary apparel purchases before Easter. A late Easter 
will tend to make April sales heavy relative to March, and, within limits, 
the later in April that Easter occurs, the greater is this tendency. On the 
other hand, when Easter occurs in March, March sales and possibly 
February sales will be increased. 

A procedure used by the Federal Reserve System for making Easter 
adjustments in the department store sales series is as follows: 

1. Compute preliminary seasonal adjustment factors^ These should 
eliminate, so far as possible, seasonal fluctuations other than those caused 
by changes in the date of Easter. If a moving seasonal has been com- 
puted, the factors will vary from year to year, as shown in Columns 3 
and 6 of Table 15.4. 

2. Using these factor s^ compute seasonally adjusted index numbers for 
March and April of each year. These are shown in Columns 4 and 7 of 
Table 15.4. 

3. Nextf compute the percentage change from March to April in these 
preliminary seasonally adjusted indexes. These changes, which are shown 
in Column 8 of Table 15.4 do not, however, reflect solely the influence of 
Easter, but also the general trend of the series in the course of cyclical, 
secular, or short-term movements. Therefore, it is necessary to adjust 
further for short-term trends. 

4. Derive approximate adjustments for short-term trend. If the method 
of seasonal adjustment used is that described in the June 1941 Federal 
Reserve Bulletin, the March and April figures for each year can be read 

®This scetioE was prepared initially by Robert E. Lewis, formerly economist with 
the Federal Reserve Bank of New York and now ecoaonsist with the National City 
Bank of New York. The procedure is taken 'in pari from pp. 1472-1473 of the 
December 1951 Fedmd Reserve Bulletin, The examples shown are based on the 
experience of the Federal Reserve Bank of Now York. 

^ In this instance, the procedure used was that described on the preceding pages 
and .referred to as the Federal Reserve method. 
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TABLE 15,4 

Computation of March-April Percentages of Change in Preliminary Season-^ 
ally Adjusted Indexes of Department Store Sales in the Second Federal 
Reserve District, 1919-1953 


Year 

(1) 

March 

1 April 

Per cent 
change 
Mar. to 
Apr. 

Col. (7) -h 
Coi. (4) 

(8) 

1 Un- 
adjusted 
index* 
1947-49 
= 100 

(2) 

Chang- 

ing 

sea- 

sonal 

(3) 

Pre- 
liminary 
seasonally 
adjusted 
index * 
Col. (2) -b 
Col. (3) 
(4) 

Un- 
adjusted 
index* 
1947-49 
- 100 

(5) 

Chang- 

ing 

sea- 

sonal 

(6) 

Pre. 

liminary 
seasonally 
adjusted 
index* 
Col. (5) -f- 
Col (6) 

^ (7) 

1919 

27.2 

95 

28.6 

32.9 

100 

I 32 9 

+ 15 0 

20 

39.2 

95 

41.3 

39.0 

100 

39.0 

- 5.6 

21 

37.9 

94 

i 40 3 

38.9 

100 

38.9 

- 3 5* 

22 

35.2 

92 

38.3 

41.1 

100 

41.1 

+ 7.3 

23 

39.3 

91 

43 2 

42.0 

100 

42.0 

- 2 8 

24 

38.8 

90 

43.1 

44.6 

100 

44.6 

+ 3.5 

25 

40 9 

89 

46.0 

45.7 

99 

46.2 

+ 0.4 

26 

42.0 

89 

47.2 

45.6 

98 

46.4 

- 1.7 

27 i 

42.2 

89 

47.4 

49.2 

98 

50.2 

+ 6.9 

28 1 

43.1 

89 

48.4 

47.1 

98 

48.1 

- 0.6 

29 

48.6 

89 

54.6 

48.3 

98 

49.3 

- 9.7 

1930 

45.5 

89 

51.1 

53.1 

98 

54.2 

+ 6.1 

31 

45 1 

89 

50.7 

49.1 

98 

50.1 

- 1 2 

32 

35.3 

89 

39.7 

38.3 

98 

39.1 

- 1.5 

33 

27.9 

89 

31.3 

35.9 

98 

36.6 

+ 16.9 

34 

36.8 

89 

41.3 

35.9 

98 

36.6 

-11.4 

35 

33.0 

89 

37 1 

36.6 

98 

37 3 

+ 0.5 

36 

35.4 

j 89 

39.8 

39.0 

98 

39 8 

0 

37 

39.3 

! 89 

44.2 

40.7 

98 

41.5 

- 6,1 

38 

34 6 

! 87 

39.8 

40.7 

98 

41.5 

+ 4.3 

39 

35.6 

87 

40.9 

40.1 

98 

40.9 

0 

1940 

37.0 

87 

42.5 

38.5 

98 

39.3 

- 7.5 

41 

39.2 

87 

45 1 

46.6 

98 

47.4 

+ 5.1 

42 

48.4 

90 

53.8 

49.3 

97 

50.8 

- 5.6 

43 

47.2 

94 

50.2 

53.3 

97 

54.9 

+ 9.4 

44 

57 0 

98 

58 2 

56.4 

96 

58.8 

+ 1.0 

45 

72.4 

99 

73.1 

58.9 

96 

61 4 

-16,0 

46 

85.2 

99 

86.1 

90.5 

96 

94.3 

+ 9.5 

47 

94.7 

98 

96 6 

92.5 

96 

96 4 

- 0,2 

48 

97.3 

95 

102.4 

98.6 

96 

102.7 

+ 03 

49 

86.4 

89 

97.1 

99,3 

96 

103.4 

+ 6.5 

1950 

86.7 

89 

97.4 

93.9 

96 

97.8 

+ 0.4 

51 

94.8 

: 89 

106.5 

95.6 

96 

99.6 

- 6.5 

52 

87.9 

: 89 

98.8 

96.5 

96 

100.5 

+ 1.7 

53 

93.3 

i 89 

104.8 

95.5 

96 

99,5 

- 6.1 


* While departmeat store indexes are not ordinarily published to one decimal place» an exception has 
been made in this case in order to avoid distortion due to rounding in the comparisons for the early 
years. 

Bata from Federal Reserve Bank of New York. 
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TABLE 15.5 

Beiermmation of Net Faster Changes for Department Store Sales in the 
Second Federal Reserve District, 19 19-19 53 


Year 

(1) 

Determination of short-term 
trend* 

Per cent change, 
March to April, 
from Table 15.4 

(5) 

Net Easter 
changes 
Col (5) “ 
CoL (4) 

(0) 

Date of 
Easter 

(7) 

March 

(2) 

April 

(3) 

Per cent short- 
term change, 
March to April 
Col. (3)-i-Col. (2) 
(4) 

1919 

30 9 

31.7 

-4-2.6 

4-15.0 

4-12 4 

April 20 

20 

40. 9 

41.1 

-fO 5 

“ 5.6 

“ 6 1 

April 4 

2i 

39.8 

39 G 

“0.5 

“35 

“30 

March 27 

22 

30 1 

30 4 

“hO . 8 

4- 7.3 

-4-6 5 

April 16 

23 

42 2 

42 5 

•4“0 7 

“ 2.8 

“ 3.5 

April 1 

24 

44.3 

44.3 

0 

4- 3.5 

4-3 5 

April 20 

25 

46 2 

46.3 

-f 0 2 

4- 0.4 

4-0 2 

April 12 

26 

48 3 

48.3 

0 

“ 1 7 

“ 1.7 

April 4 

27 

49 1 

49 0 

“0.2 

4- 5.9 

4- 6.1 

April 17 

28 

49.6 

40.6 

0 

“ 0.6 

“06 

April 8 

29 

52.0 

52 2 

-fO.4 

“ 9.7 

“10 1 

March 31 

1930 

52 8 

52 7 

“0.2 

4- 6 1 

4- 6 3 

April 20 

31 

49 6 

49 0 

0 

“ 1.2 

“ i 2 

April 5 

32 

39.4 

38 6 

“2 0 

“ 1 5 

4-0 5 

iMarch 27 

33 

33 8 

34.1 

-fO.9 

-4*16 9 

4-16.0 

April 16 

34 

36.8 

30.8 

0 

“11 4 

“11.4 

April 1 

35 

37 2 

37.3 

4-0 3 

4- 0.5 

4- 0.2 

April 21 

30 

39.9 

40.2 

4-0.8 

0 

“ 0.8 

April 12 

37 

43.5 

43. G 

4-0 2 

“ 6.1 

“63 

IVIarch 28 

38 

41.0 

40 6 

“1.0 

-f 4 3 

4- 5.3 

April 17 

39 

40.2 

40.4 

4-0 5 

0 

“ 0.5 

April 9 

1940 i 

42 2 

1 42.3 

4-0.2 

“ 7.5 

“ 7.7 

I\Iarch 24 

41 I 

46 6 

i 47.0 

4-0 9 

4* 5 1 

4- 4.2 

April 13 


51 2 

51.4 

-f 0 4 

“56 

- 6 0 

April 5 

43 1 

; 54.1 

54 3 

4-0 4 

4* 9 . 4 

4- 9.0 

April 25 

44 

58.4 

59 0 

4*1.0 

4- 1.0 

0 

April 9 

45 

m 8 

67.2 

4-0 6 

; “16.0 

' “10.6 

April 1 

46 

86 5 

89 0 

4-2 9 

4-9 5 

4-0 6 

April 21 

47 1 

97.1 

98.0 

4-0 9 

“02 

“ 1 I 

April 6 

48 : 

: 102.4 1 

102 8 

4-0 4 

•4 0 3 

“ 0 1 

March 28 

49 

99 9 

99 1 

“0 8 

4-6 5 

4* 7 3 

April 1 7 

1950 

96.7 ^ 

97 6 

1 4-0.9 

4-0 4 

“ 0.5 

April 9 

51 

107.0 

lOB 8 

1 “0.2 

“ 6.5 

^ “ 6.3 

: March 25 

52 

1013 5 

1 100 3 

1 “0.2 

4- 3.7 

! »4- 1 9 

April 13 

53 

102 4 

! 102 4 

0 

“ 5.3 

! “ 5.1 

April 5 


* Values in Colunms 2 and 3 were read from a chart, as eKplained m stop 4. 

Data in Colttiuns 2, 3, and 4 from Federal Reserve Banlv of New York, Figures lu. Coiumn 5 from 
Table 15.4. 
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from the chart of the revised freehand curve. Alternatively, the March 
and April figures can be read from a freehand curve drawn through a chart 
of the preliminary seasonally adjusted series. Percentage changes 
between these March and April figures are then computed to give a rough 
measure of the month-to-month movement attributable to short-term 
trend. See Columns 2, 3, and 4 in Table 15.5. 

5. Obtain net Easter changes by subtracting algebraically the shoriAcrm 
trend 'percentages from the original Ma7'ch-April changes computed in 


A 

p 

R 

I 

t- 


M 

A 

R 

C 

H 


Chart lSw4, Date of Easter and Net Easter Change for Department- 
Store Sales in the Second Federal Reserve District, 1919-195S. Data from 
Table 15.5. 

step 8. In other words, the original changes are lowered slightly when 
the general movement or trend of the seasonally adjusted index during 
the first half of the year is upward, and they are raised slightly “when the 
general movement is downward. These net Easter changes are shown 
in Column 6 of Table 15.5. 

6. To confirm that these net Easter changes actually do vary in 
accordance with the date of Easter, we have plotted, year by year, these 
changes and the date of Easter. (See Chart 15.4, which uses data from 
Table 15.5.) It is apparent that there is a marked tendency for April to 
show a greater percentage increase over March when Easter is late and a 
smaller increase or a decline when Easter is Qarly. However, this chart 




356 


ANALYSIS OF TIME SERIES 


[Chap. 15 


does not tell us how much on the average April sales are increased over 
those of March for each additional day later that Easter occurs. Such 
an estimate can be obtained by plotting the net Easter changes, not by 
years, but with the Easter dote along the horizontal axis, as in Chart 15.5. 

7. Fit a freehand trend line to the data shown in Chart 16.5. The esti- 
mating line may be fitted mathematically if desired, but it would seem 
preferable to be able to discount those years when unusual factors 



Chart: 15,5. Net Easter Change in Relation to Date of Easter and Graphic 
Estimate of Gross Easter Correction Factor for Department-Store Sales in 
the Second Federal Reserve District, 1919-1953. Data from Table 15.5. The 
curve serves as a guide for determining the stepped line, from which the gross cor- 
rection factors are read. These are then entered in Column 2 of Table 15.6. 

affected data for March and April. (For department stores, sales were 
reduced in March 1933 because of the bank holiday and in April 1945 
because many stores were closed at the time of President Roosevelt^s 
death.) 

It should be noted that this line is horizontal throughout March. If 
Easter occurs at any time from March 22 through April 1, no pre»Easter 
sales will be made in April, no matter when in this period Easter occurs. 
Conceivably, a very early Easter could mean increased Februaiy sales 
relative to March, but the difference is not ordinarily great enough to 
warrant a special adjustment. 
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TABLE 15.6 


Easter Correction. Factors for Department Store 
Sales in the Second Federal Reserve District 


Bate of 
Easter 

(1) 

Gross 

correction 

factor 

(2) 

Net correction 
factor for 
March 
(3) 

Net correction 
factor for 
April 
(4) 

March 

22 

-6 

4-3 

i 

-3 

23 

-6 

4-3 

-3 

24 

-6 

4-3 

-3 

25 

~6 j 

4-3 

-3 

26 

-6 

4-3 

-3 

27 

-6 

4-3 

-3 

28 

-6 

4-3 

~3 

29 

-6 

4-3 

-3 

30 

-6 

4-3 

-3 

31 

~6 

4*3 

-3 

April 




1 

-6 

4-3 

~3 

2 

-6 

4-3 

-3 

3 

-6 

4-3 

^3 

4 

-6 

4-3 

-3 

5 

-4 

42 

' -2 

6 

-4 

42 

-2 

7 

~4 

42 

-2 

8 

-2 

41 

-1 

9 

-2 

41 

-1 

10 

0 

0 

0 

11 

0 

0 

0 

12 

0 

0 

0 

13 

+2 

~1 

41 

14 

+2 

-1 

41 

15 

+4 

-2 

42 

16 

4-4 

-2 

42 

17 

4-6 

~3 

43 

18 

4-6 

-3 

43 

19 

4-6 

-3 

43 

20 


-4 

44 

21 

4-8 

-4 

44 

22 

4-8 

-4 

44 

23 

4-8 1 


44 

24 

4-8 

-4 

44 

25 

4-8 I 

-4 

44 


The data of Colunm 2 were read from Chart 15,5, 


8, Mead off the gross correction factor for each dode of Easter from the 
trend line to the nearest even number. These figures are shown in 
Column 2 of Table 15.6. 

9. Divide the gross correction factor hy two to obtain the net correction 
factors. April sales gain b j a late Easter what March sales lose^ and vice 
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TABLE 15.7 


Adjustment of March and April Seasonal Index Numbers of Department 
Store Sales in the Second Federal Reserve District for Variation in the 
Bate of Easter, 1919-1956 


Year 

(1) 

Date of 
Easter 

(2) 

Net correc- 
tion factor* 

(3) 

1 March seasonal 

1 April seasonal 

Uncorrected 

(4) 

Corrected 
Col. (4) - 
Co!. (3) 
(5) 

Uncorrected 

(6) 

Corrected 
Col. (6) + 
Col. (3) 
(7) 

1919 

April 20 

+4 

95 

91 

100 

104 

20 

April 4 

-3 

95 

98 

100 

97 

21 

March 27 

-3 

94 

97 

100 

97 

22 

April 1 6 

+2 

92 

90 

100 

102 

23 

April 1 

-3 

91 

94 

100 

97 

24 

April 20 

+4 

90 

86 

100 

104 

25 

April 12 

0 

89 

89 

99 

99 

26 

April 4 

-3 

89 

92 

98 

95 

27 

April 17 

+3 

89 

86 

98 

101 

28 

April 8 

-1 

89 

90 

98 

97 

29 

March 31 

-3 

89 

92 

98 

95 

1930 

April 20 

4-4 

89 

85 

98 

102 

31 

April 5 

-2 

89 

91 

98 

96 

32 

March 27 

-3 

89 

92 

98 

95 

33 

April 16 

4-2 

89 

87 

98 

100 

34 

April 1 

~3 

89 

92 

98 

95 

35 

April 21 

4-4 

89 

85 

98 

302 

36 

April 12 

0 

89 

89 

98 

98 

37 

March 28 

-3 

89 

92 

98 

95 

38 

April 17 

4-3 

87 

84 

98 

101 

39 

April 9 

-1 

87 

88 

98 

97 

1940 

March 24 

-3 

87 

90 

98 

95 

41 i 

April 13 

4-1 

87 

86 

98 

99 

42 

April 5 

-2 

90 

92 

97 

95 

43 

April 25 

4-4 

94 

90 

97 

101 

44 

April 9 

-1 

98 

99 

96 

95 

45 

April 1 

-3 

99 

102 

96 

93 

46 

April 21 

4-4 

99 

95 

96 

100 

47 

April 6 

-2 

98 

100 

96 

94 

48 

March 28 

-3 

95 

98 

96 

93 

49 

April 1 7 

+3 

89 

86 

96 

99 

1950 

April 9 

-1 

89 

00 

96 

95 

51 

March 25 

-3 

89 

92 

96 

93 

52 

April 13 

4-1 

89 

88 

96 

97 

53 

April 5 

-2 

89 

91 

96 

94 

54 

April 18 

4-3 

89 

86 

96 

99 

55 

April 10 

0 

89 

89 

96 

96 

56 

April 1 


89 

92 

96 

93 


* To be added algebraicaHy to April and subtracted algebraically from March. 

Data in Oolunms 2 and 3 from Table X5.6. Figures in Columns 4 and 6 from Federal Eeserve Bank 
of Nw York. 
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versa; therefore, half of the gross correction factor is subtracted from one 
month and half is added to the other. These net factors are shown in 
Columns 3 and 4 of Table 15.6. 

10. Finally^ add the net correction factors algebraically to the April 
seasonal adjvsimeni factors and subtract them algebraically from the March 
factors, as in Table 15.7. The resulting seasonal factors are the ones 
applied to the unadjusted index numbers to obtain the published season- 
ally adjusted series. 

Once a satisfactory set of Easter adjustments has been derived, the 
entire set of computations need not be repeated each year. Easter cor- 
rection factors can be read from a table, such as Table 15.6, and applied" 
to the basic seasonal factors, as projected in Table 15.7 through 1956. 
Every few years, however, the additional experience should be used to 
review the adequacy of the Easter adjustment, just as in the case of 
changing basic seasonal factors.^ 

Sudden changes in entire seasonal pattern. Prior to 1935 it had 
been customary to hold the New York automobile show in January of 
each year. It was mentioned, in Chapter 11, that in 1935 a show was 
held, not only in January, but also in November, ihe November show 
being in lieu of the show originally planned for January 1936. For some 
years thereafter the show was held in November. The importance of the 
New York show stemmed from the fact that it was at these shows that 
most new models of automobiles were revealed to the public. Before 
1935 the seasonal movement of automobile sales showed a high in the 
spring (a few months after the show) and a low in the fall and winter. 
From 1935 until the beginning of World War II, two seasonal highs each 
year were in evidence, one in the spring and one very late in the year. 

When a sudden change in an entire seasonal pattern occurs, it is merely 
necessary to compute two seasonal indexes, one for the period preceding 
the change and one for the years following the change. The two indexes 
may be either constant or changing, whichever is appropriate for the series. 

Short-time shifts in timing. I'he varying date of Easter affe(!ts 
materially only ]\Iarch and April; changing the date of the automobile 
show affected chiefly a fe%v months preceding and following it. Weather 
conditions, however, which also vary from year to jenr, may result in 
early harvests one year and late harvests the next; and not only may the 
marketing of the product begin at different times in different years, but 


5 An interesting method of making adjustments for the changing date of Easter in 
weekly seasonal adjustment factors has been worked out by the Federal Reserve Bank 
of Cleveland. See, “Description of IVIethod of Computation of the Weekly Index o! 
Department Store Sales, Fourth Federal Reserve District/^ pp. 4-9 (mimeographed, 
July 1952). 
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the flow of goods during the entire year may be affected, the effect being 
to shift the whole pattern a few months to the left or right. Likewise, 
consumer demand may vary in timing, depending on how early the 
weather changes. 

Such shifting seasonal patterns present a difficult problem. Perhaps 
the most practical solution is to regard the situation as a special case of a 
sudden change in entire pattern, to group together the years (not neces- 
sarily adjacent) which show the same timing in their seasonal turns, and 
to compute as many seasonal indexes as there are groups of years. In 
computing such indexes, there is no reason why the calendar year must be 
taken as a unit. Rather, if the subject matter has to do with agriculture, 
the year should be related to the crop year. Perhaps the central month 
should be the seasonal high or the seasonal low. 

Varying amplitude. Some economic series retain more or less the 
same general seasonal pattern from year to year but have a tendency to 
vary either gradually or suddenly in amplitude. This is particularly 
true of stocks of agricultural commodities. For example, stocks of agri- 
cultural crops show varying seasonal amplitude from year to year depend- 
ing upon the amount carried over from the preceding year, the size of the 
harvest, and the amount consumed. Likewise, shipments of livestock 
are likely to vary in the amplitude of their seasonal swing. Here the 
variation may have something to do with the advantage of immediately 
selling the livestock, as compared with holding them for further fattening 
or a price increase. Since the relative advantages of these policies 
(discussed on page 145) are likely to vary in cycles, so the amplitude of the 
seasonal variation is likely to change in cycles, and the change in pattern 
might conceivably be treated as a moving seasonal. Another case is that 
of increased seasonal amplitude in manufacturing, brought about by a 
general cyclical tendency toward hand-to-mouth buying. It is apparent 
that this change also might be thought of as a moving seasonal, the 
progression being cyclical rather than trend-like. 

It must be apparent that, when the amplitude of a seasonal movement 
is not changing gradually but changing suddenly, and in the main unpre- 
dictably, a moving seasonal cannot overcome the difficulty any better 
than it can that of short-time shifts in the entire seasonal pattern. Any 
of the t 3 rpes of seasonal indexes hitherto described would in some years 
show too great amplitude and in other years too small amplitude. The 
method of correcting a seasonal index for sudden changes in amplitude 
is somewhat akin to the adjustment for the changing date of Easter. It 
will not be described in detail in this volume,® but in general the procedure 

^ description, with tables and charts, see the first edition of this text, pp. 
518-524. 
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consists of determining the relationship that exists for the 12 months oj 
each year between (1) the seasonal index expressed as deviations from lOO 
and (2) the percentage deviations of the original values from the 12-inonth 
centered moving average, the latter percentage deviations being adjusted 
to average isero. The relationship between the 12 pairs of values for each 
year yields an amplitude ratio which indicates the correction to be applied 
(by multiplication) to the original seasonal values expressed as deviations 
from 100. To each of these deviations 100 is then added. 

A word of caution may be in order: if a moving seasonal has been used, 
a change in the amplitude ratio does not necessarily indicate a change 
in the seasonal amplitude of the original data. A gradual increase in 
the seasonal amplitude, for instance, would be reflected in the moving 
seasonal index rather than in the amplitude ratio; but the moving seasonal 
would fail to register any sudden departures from the general trend in 
amplitude change. 

FURTHER REFINEMENTS OF METHOD 

Continuity of seasonal indexes. A stable seasonal index averages 
100 per cent, not only for the 12-month period selected for the index, but 
for any consecutive 12-month period. The latter, however, is not true 
for any of the seasonals explained in this chapter, though in the case of a 
progressive or moving seasonal the discrepancy is nominal only. Par- 
ticularly in the case of seasonal indexes corrected for variations in ampli- 
tude, however, the discrepancy may assume alarming proportions. The 
difficulty manifests itself in discontinuity of the seasonally adjusted data 
at the point where one year ends and the next begins. Let us assume, for 
instance, that the unadjusted seasonal index numbers for December 1952 
and January 1953 are each 80 per cent, the amplitude adjustment to be 
applied, let us say, to calendar years. Now, suppose further that the 
amplitude ratios are 0.5 and 1.5, respectively. This makes the adjusted 
December 1952 index number 40 per cent and the January 1953 number 
120. It is apparent that there will be an enormous drop in the seasonally 
adjusted data between December and January. Yet a little thought will 
convince one that the change in amplitude does not take place entirely 
in a month^s time, but represents a transition of several months’ duration. 

Although there is no entirely satisfactory solution for this difficulty, 
one remedy, which is very laborious, is to compute an amplitude ratio for 
each consecutive 12-month period of the entire series. For instance, if 
the data ran from 1943 through 1953, the first i2-ioonth period would ran 
from January 1943 through December 1943, the second from February 
1943 through January 1944, and so on. Altogether there would be 121 
such 12-month periods and the same number* of amplitude ratios. Wp 
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could speak of these ratios collectively as a moving amplitude ratio. 
Following the analogy of a 12-month moving average, these ratios should 
be centered by a 2-month moving average, leaving 120 amplitude ratios, 
running from July 1943 through June 1953. The seasonal index numbers 
are then multiplied by these amplitude ratios to obtain the final seasonal 
index numbers. 

This procedure is laborious, but it is not entirely satisfactory. Al- 
though there is no sharp break in the continuity of the series, it has the 
defect that not any 12 consecutive seasonal index numbers are centered 
on 100 per cent. A less accurate but also much less laborious procedure 
than the one just described is to compute an amplitude ratio for each 
standard year, center the ratio on the sixth or seventh month, and inter- 
polate arithmetically from one year to the next. 

Combinations of seasonal types. It is frequently true that the 
seasonal variation of a series may be gradually changing in pattern, 
shifting in its timing, or varying in amplitude, or some combination of the 
three. For data showing shifts in timing and changes in amplitude, the 
procedure for obtaining final 'seasonal indexes might be: (1) break data 
into sub-periods according to occurrence of seasonal high; (2) compute 
stable seasonal for each such sub-period; (3) using these seasonal indexes, 
compute amplitude ratios for each year (possibly using the method of 
interpolation described above) ; (4) multiplj^ the seasonal index numbers 
by the appropriate amplitude ratios. 

Other combinations of seasonal behavior may call for different treat- 
ment. Considerable ingenuity is frequently required to measure seasonal 
variation successfully. Unfortunately, there is no way of telling when 
we have arrived at the best solution of the problem. Complexit};^ of 
procedure does not guarantee that the results obtained accurately 
describe the movement which we set out to measure. Particularly if the 
data are originally unreliable, great refinement of method is likely to be 
largely wasted effort. 

Logical basis of methods of constmciioii. With the exception of 
the adjustment for Easter, the methods described in this chapter are more 
or less empirical in nature, depending for their validity upon the results 
which they produce. A method is held to be satisfactory if the deseason- 
alixied data: (1) do not show similarity of intra-year pattern (other than 
cyclical) in different years; (2) are not extremely irregular in their move- 
ments; and (3) are of about the same magnitude as the original data in 
12-moiith periods. 

The Easter adjustment, on the other hand, attempted to find a func- 
tional relationship between April sales minus March sales and the date of 
BJaBter. Carrying this idea further, it might be possible to find a numeri- 
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cal relationship over time between length of daylight and sales of incan- 
descent lamps; or between temperature and sales of ice; or between a 
combination of temperature and snowfall and sale of galoshes. Computa- 
tion of seasonal indexes by such a method would carry us far into the field 
of correlation, which is treated in Chapters i9“~22. Furthermore, it 
would be difficult to measure the importance, let us say, of Christmas by 
correlating sales with some other factor. 

Intermediate between these two types of methods is that which obtains 
a first-approximation seasonal index by an empirical metliod, and then 
seeks to smooth this index by fitting a curve to the seasonal index numbers 
on the theory that the seasonal movement would present a smooth 
pattern if the period covered were long enough to permit an exact can- 
celling out of all irregular movements. Freehand smoothing of the 
seasonal curve is practiced by a few statisticians. The fitting of a mathe- 
matical curve is not usually advocated. Not only may logical objections 
be raised, but there may be social factors that disturb the smoothness of 
contour inherent in a simple mathematical curve. 



Symbols Used in Chapter 16 


jSi: lower-case Greek beta, a measure of skewness. See Chapter 10. 

^ 2 ’ lower-case Greek beta, a measure of kurtosis. See Chapter 10. 

C: cyclical. 

I : irregular. 

N : the number of items in a series. 
s: standard deviation. See Chapter 10. 

S: seasonal. 

S: upper-case Greek sigma, meaning ‘Hake the sum of.^' 

T: trend. 

X : a value of the X series. 

y: a C3^clical deviation; after irregular movements have been smoothed, 
the deviation of a value in a time series from the combined estimate of 
trend and seasonal. 

Yci a computed value of the Y series. 
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CHAPTER 16 


Analysis of Time Series: 

CYCLICAL MOVEMENTS—AD JUSTING TIME 
SERIES FOR TREND, SEASONAL, AND 
IRREGULAR MOVEMENTS 


In Chapter 11 it was pointed out that monthly time series are typically 
the product of the four important movements: secular trend (T), seasonal 
variation (S), cyclical movements (C), and irregular fluctuations (I), 
Chapters 12 and 13 were devoted to consideration of types of trends, how 
to select the appropriate type, and methods of trend fitting. Chapters 14 
and 15 gave attention to types of seasonal variation and the determination 
of indexes of seasonal variation. In this chapter, we shall first discuss the 
elimination of trend from annual time series data. Following this, both 
seasonal variation and trend will be eliminated from monthly data, and 
irregular movements will be smoothed. The final result will be a set of 
adjusted data showing primarily the cyclical movements of the series. 

ADJUSTING ANNUAL DATA FOR TREND 

It is, of course, obvious that annual data, which show but one figure 
for each year, cannot contain any seasonal variation. Neither can annual 
data show irregular movements, although it is possible for an episodic 
movement (such as one due to a severe strike or a conflagration) to be 
important enough to affect an annual total. 

Table 12.2 showed the computations necessary for determining a 
straight-line trend for magazine advertising for 1915-1949. The trend 
values resulting from use of the equation were given in the last column 
of Table 12.2 for 1915-1953. Chart 12.3 showed both the observed 
annual data and the trend. Table 16.1 repeats the observed data for 
1915-1953 and the trend values for the same years. In Table 16.1 we 
have also computed the per-cent-of«trend values for each year* These 
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Chart 16.1. Animal Data of Magazine Advertising in the United States 
Adjusted for Trend, 1915-1953. The 100 per cent base is shown as a broken line 
for 1949-1953 because the trend was fitted to the years 1915-1949 and extended to 
1953. Data of Table 16.1. 

are obtained by dividing each of the original figures by the corresponding 
trend value and multiplying by 100. The results are shown in Chart 
16.1. Annual data provide only very rough indicators of the fluctuations 

TABLE 16.1 


Adjustment for Trend of Data of Magazine Adt^ertising in the United 

States, ms-ms 

(Original data and trend vainea m millions of agate linos) 


Year 

Original 

data 

r 

Trend 

values 

P(‘r cent 
of trenil 
100(F F.) 

Year 

Original 

data 

r 

1 Trend 
values 
Fc 

Per cent 
of trend 

1000' ^ y\) 

1915 

16.9 

' 21.2 

79.7 

1935 

25.4 

33 0 

77.0 

1916 

20.0 

21.8 

91.7 

1936 

28.5 

33.6 

84.8 

1917 

21 3 

22.4 

95.1 

1937 

32 1 

34.2 

93 9 

1918 

18.6 

22.9 

81.2 

1038 

25.4 

34.7 

73.2 

1919 

25.7 

23.5 

109.4 

1939 

25 6 

35 3 

72 5 

1920 

33 . 6 

24.1 

139.4 

1940 

26 9 

35.9 

74.9 

1921 

22.3 

24.7 

90.3 

1941 

27.7 

36,5 

75.9 

1922 

24.4 

25.3 

06.4 

1942 

25 7 

37.1 

69.3 

1923 

30 2 

25.9 

116,6 

1943 

33.1 

37.7 

87.8 

1924 

31.4 

20 . 5 i 

118.5 

1944 

42.0 

38.3 

109.7 

1925 

31.5 

27.1 

U6.2 

1945 

49.0 

38.9 

126.0 

3926 

35.5 

27.7 

128.2 

1946 

54.8 

39.5 

138 7 

1927 

36.5 

28.2 

129.4 

1947 

50.8 

40.0 

127.0 

1928 

36,4 

28.8 

126.4 

1948 

47.8 

40.6 

117.7 

1929 

40.6 

29.4 

138.1 

1949 

43,8 

41 2 

106.3 

1930 

35.8 

30,0 

119.3 

1950 

45. 8*^ 

41.8 

109.6 

1931 

28,9 

I 30.6 

94.4 

1951 

48.1^ 

42.4 

113.4 

1932 

21.2 

31.2 

67.9 

1952 

48.3* 

43.0 

112.3 

1933 

1934 

18.7 

24.3 

31.8 
: 32.4 

58 8 

75.0 

1953 

50.5* 

43.6 

115.8 


Not used for computing trend. 

Original data from various Issues of Bur»ey of Current BuHness, Trend values from Table 12.2. 

♦ 
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of a time series, but Chart 16.1 shows that marked fluctuations have 
occurred in annual magazine advertising linage. 

In Table 16. L trend was eliminated hy division, rather than by sub- 
traction. If the trend values had been subtracted from the original 
figures, the result would have been deviations in abolute terms (millions 
of agate lines) rather than relative terms. For most purposes, it is more 
useful to know whether the variations are large, or small, in relation to 
some logical base, such as the trend. Thms, a deviation of 50 is ten times 
as important when judged with respe(^t to a trend value of 200 as it is 
when compared with a trend value of 2,000. 

ADJUSTMENT OF MONTHLY DATA 

Although there are other methods of arriving at estimates of the 
cyclical movements of time series, some of which are mentioned at the end 
of this chapter, the so-called residual method” is most commonly used. 
This method consists of eliminating seasonal variation and trend, thus 
obtaining the cyclical-irregular movements. Symbolically/ 

(TXSXC X I) ^ TXC XI md 
{T X C X I) T = C X L 

Next, the data are usually smoothed in order to obtain the cyclical move- 
ments, which are sometimes termed the cyclical relatives, since they are 
always percentages. It is because the cyclical-irregular or the cyelitml 
movements remain as residuals that this procedure is referred to as the 
7'csidual method. 

Deseasoualiziiig. As pointed out in Chapter 11, a seasonal index 
may be computed for the purpose of studying the seasonal movement 
itself, the objective being to avoid or minimize the consequences of the 
seasonal (diangcs, to smooth out the seasonal fluctuations, or to take 
advantage of them. On the other hand, we may be interested in studying 
a time scries undisturbed by seasonal variation, and this we accomplish 
by adjusting the observed data for seasonal variation. 

The computation of a seasonal index and its use to dcseasonalize a set 

^ The concept ol T X S X € X I h more generally useful than is that of T S -r 
€ + 1 . Tliis is because 8. C, and I tend to remain more nearly conslant in magni- 
tude redative to trend, rather than in absolute terms. Furthermore, the movenicuts 
are ordinarily more meaningful when considered relative to each other tlian when 
considered in absolute terms. Thus, it is possible to compute a seasonal index which 
remains constant over a period of years, to determine a seasonal index which is chang- 
ing bi^cause of alterations in the relative importance of the months, and to compare 
the percentage tiuctiiations of cyclical movements. Occasionally series are eineoim- 
tered for which better results are obtained if the seasonal movement is considered 
constant in absolute rather than relative terms. This is discussed on pages 372-373. 
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of monthly data may be but one step in the isolation of cyclical move- 
ments, the other steps (to be described shortly) being the adjustment for 
trend and the smoothing of irregular movements. Not infrequently, 
however, it may be desired to study economic and business series adjusted 
only for seasonal variation. Thus, businessmen, in making decisions, 
may consider not so much whether their sales are increasing (or decreas- 
ing) relative to a not-too-easily-visualized combination of trend and 
seasonal movements, but rather in relation to the ordinarily expected 
sales for the particular season of the year. It is of interest that many 

THOOSANOS or 
SHORT TOMS 



Chart 16.2. . Constimption of Newsprint by United States Publishers (Solid 
Line) and Deseasonalized Data (Broken Line), 1944-1952. Data of Table 16.2. 

deseasonalized series appear in the Federal Reserve Bulletin^ issued by the 
Board of Governors of the Federal Beserve System, and in the Survey of 
Current Business, published by the Office of Business Economics of the 
Department of Commerce. 

The elimination of seasonal variation is ordinarily accomplished by 
dividing the original values by the seasonal index (and multiplying 
the results by 100), as shown in Table 16.2 for the data of news- 
print consumption. That k: {T X S X C X I) S ^ T X C X I, so 
that the deseasonalized data contain trend, cyclical and irregular move- 
ments. The deseasonalized data of Table 16.2, together with the original 
figures of newspaper consumption, are shown in Chart 16.2, where it is 
apparent that the curve of the deseasonalized data is much the smoother 
of the two. Because the period covered consists of but nine years, 
neither the original data nor the deseasonalized data show cyclical move- 
ments. The data of newsprint consumption were chosen as an illustra- 
tion in Chapter 14, not because they would or would not show cyclical 
movements after seasonal variations were removed, but because the series 
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had a clearly defined seasonal which, when tested by drawing the tw^elve 
monthly charts of the per-cent-of-moving-average data (like Chart 15.2), 
did not appear to change* from year to year. However,, the curve of the 
deseasonalized data suggests that the seasonal index may not be quite as 



Chari 16.3. Year-Over-Year 
Chart of Dc«easonalized Data of 
Consumption of Newsprint hy 
United States Publishers, 1944“ 

1952. Data of Table 16.2. 

satisfactory for 1952 as for the earlier years, and if the analysis were to be 
continued for a number of years beyond 1952, the 12 monthly charts 
should, of course, be extended and re-examined. Incidentally, the peaks 
shown in the deseasonalized data for May 1949 and April 1951 do not 
represent residual seasonal fluctuations, but reflect unusually high 
original values for those months, as may be seen in Table 16.2. 


® There was evidence of a slight increase in the seasonal importance of April and 
May and a slight decrease in the Importance of July and August, but no clear evidence 
of a changing seasonal movement. 
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TABLE 16.2 


Elimination of Seasonal Variations from Data of Consumption of Newsprint 
by United States Publishers, 1944-1952 

(Original and deaeasonalized data in thousands of short tons) 


Year and 
montli 

(i) 

Orig- 

inal 

data 

(2) 

Sea- 

sonal 

index 

(3) 

Deseason- 
alized 
data 
CoL2 
Col. 3 
(4) 

Year and 
month 

(1) 

Orig- 

inal 

data 

(2) 

Sea- 

sonal 

index 

(3) 

Deseason- 
alized 
data 
Col. 2 
Col. 3 
(4) 

1944 

January. . . 

194.7 

93.9 

1947 

207.3 January... 

266.4 

93.9 

283.7 

February . . 

176.2 

90.3 

195.1 

February . . 

258.4 

90.3 

286.2 

March 

201.7 

105.2 

191.7 

March 

302.7 

105.2 

287.7 

April 

201.1 

104.7 

192.1 

April 

297.5 

104.7 

284.1 

May 

197.4 

105.3 

187.5 

May 

303.0 

105.3 

287.7 

June 

191.1 

98.9 

193.2 

June 

292.7 

98.9 

296.0 

July 

174.9 

88.5 

197.6 

July 

263.7 

88.5 

298.0 

August 

182.4 

93.1 

195.9 

August .... 

281,1 

93.1 

301.9 

September . 

189.6 

99.3 

190.9 

September . 

299.8 

99.3 

301.9 

October 

218.1 

110.4 

197.6 

October. . . . 

339.3 

110.4 

307.3 

November . 

211.6 

107.2 

197.4 

November . 

338.0 

107.2 

315.3 

December.. 

206.0 

103.5 

199.0 

December.. 

322.1 

103.6 

311.2 

1945 

January. . , 

185.2 

93.9 

1948 

197.2 January... 

292.6 

93.9 

311.5 

February . . 

175.1 

90.3 

193.9 

February . . 

297.4 

90.3 

329.3 

March 

202.8 

105.2 

192.8 

March 

338.3 

105.2 

321.6 

April 

203,2 

104.7 

194.1 

April ...... 

342.6 

104.7 

327.2 

May 

205.8 

105.3 

195.4 

May 

348,8 

105.3 

331.2 

June 

190.5 

98.9 

192.6 

June 

! 327.1 

98.9 

330.7 

July 

177.9 

88.5 

201.0 

July 

291.6 

88.5 

329.5 

August .... 

202.9 

93.1 

217.9 

August .... 

314.0 

93.1 

337.3 

September . 

213.3 

99.3 

214.8 

September . 

337.2 

99.3 

339.6 

October — 

236.9 

110.4 

214.6 

October 

381.7 

110.4 

345.7 

November . 

236.1 

107.2 

220.2 

November . 

364.3 

107.2 

339.8 

December . . 

225,4 

103.5 

217.8 

December.. 

363.7 

108.5 

351.4 

1946 

January. . . 

221.1 

93.9 

1949 

235.5 January... 

332.7 

93.9 

354.3 

February . . 

223.2 

90.3 

247.2 

February . . 

308.8 

90.3 

342.0 

March 

267.7 

105.2 

254.5 

March 

366.9 

105.2 

348.8 

April 

259.0 

104.7 

247.4 

April 

368.9 

104.7 

352.3 

May 

261.5 

105.3 

248.3 

May ...... 

392.2 

105.3 

372.5 

June 

259.3 

98.9 

262.2 

June 

349.9 

98,9 

353.8 

363.8 
341.6 

July 

243.1 

88.5 

274.7 

July 

313.1 

88,5 

93.1 

August .... 

257,3 

93.1 

276.4 

August .... 

318.0 

September . 

265,6 

99.3 

267.5 

September . 

356.5 

99.3 

359.0 

October 

292.2 

110.4 

264.7 

October 

399.3 

110.4 

361.7 

November . 

291.5 

107.2 

271.9 

November . 

378,6 

107.2 

353.2 

December.. 

294.8 

103.5 

284.8 

December., 

372.5 

103.5 

359.9 
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TABLE 16.2 (Coududed) 

EUmifmtion of Seasonal Variations from Hate of Coii- 
sumption of Newsprint by United States 
Publishers, 1944^952 


Year and 
month 

(1) 

Original 

data 

(2) 

Seasonal 

index 

(3) 

Deseasonalized 

data 

Col 2 4- Col, 3 
(4) 

1950 

January 

345.1 

93.9 

367.5 

February 

333.2 

90.3 

369.0 

March 

396.9 

105.2 

377.3 

April 1 

403.8 

104.7 

386.7 

May 

401.9 

105.3 

381.7 

June 

376.6 

98.9 

380.7 

July 

336.8 

88.5 

380.6 

August 

346.8 

93,1 

372.5 

September 

373.8 

99.3 

376.4 

October 

420.8 

110,4 

381.2 

November 

December 

407.9 

398.3 

107.2 

103.6 

380.5 

384.8 

1951 

January 

345.6 

93.9 

368.1 

February 

336.6 

90.3 

372.8 

March 

394,4 

105.2 

374.9 

April 

410.7 

104.7 

392.3 

May 

403.2 

106.3 

382.9 

June 

365.3 

98.9 

369.4 

July 

333.4 

88.5 

376.7 

August 

344.5 

93.1 

370.0 

September 

381.4 

99.3 

384.1 

October 

405.3 

110,4 

367.1 

November 

402.8 

107.2 

375.7 

December 

387.8 

103.5 

374.7 

1952 

January 

j 

345.3 

93.9 

367.7 

February 

336.6 

90.3 

372.8 

March 

399.3 

105.2 

379.6 

April 

393,6 

104.7 

j 375.8 

May 

404.1 

105.3 

1 383.8 

June 

379.9 

98.9 

1 384.1 

July 

329.7 

88.5 

1 372.6 

August 

341.6 

93.1 

366.9 

September 

379.7 

99.3 

882.4 

October. 

426.0 

110.4 

385.9 

November 

417.0 

107.2 

389.0 

December 

1 386.6 

103.5 

373.5 


Data from Tables 144 aad 14.7. 
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Test of seaso)ial. A practical test of a seasonal index is to see whether 
its use has eliminated all of the seasonal movement from the series. A 
chart of the type of Chart 1G.2 may be used for this purpose, but a year- 
over-year chart of the deseasonaliiied data, Chart 16.3, is better. From 
this chart it may be seen that the fluctuations still present in the deseason- 
alized data are largely irregular movements which stand out because of 
the lack of cyclical fluctuations in the series. When residual seasonal 
movements are present in am adjusted series, the curves of a year-over- 
year chart will show some similarity with each other. 

Correction by subtraction of seasonal. It occasionally happens that 
" grotesque results are obtained when seasonal is eliminated by dividing by 
a seasonal index. This is espcciall}^ likely to be the case when the sea- 
sonal movement typically falls almost to zero at one or more months. 
Then, if in any given year the original data remain materially above zero 
for those months, division by the extremely low seasonal index percentage 
will raise the deseasonalized data to a very sharp peak. Even though a 
seasonal movement may not fall to or near zero, there are rare instances 
in %vhich a seasonal pattern may be constant in absolute rather than 
relative terms. This will be apparent if the percentages of moving aver- 
age tend to be large when the original data are at a low level and small 
when the original data are at a high level. 

A simple expedient is as foIloAvs. Compute a seasonal index by what- 
ever method seems appropriate. The index is now converted into terms 
of the original data by multiplying the seasonal index numbers (expressed 
as percentage deviations) each year by the average value of the original 
series for that year. Seasonal is then eliminated by subtracting, alge- 
braically, the seasonal index from the original data. 

It may be desirable to compute the index number, in the first instance, 
in such a way as to obtain a seasonal index in absolute rather than relative 
terms. This will be so if the seasonal movements each year seem to be 
similar in absolute magnitude rather than in percentage deviations. 
Inspection of a chart of the original data may indicate whether this is 
true. If the evidence indicates that an index of absolute deviations 
should be computed, it is necessary only to adapt one of the methods with 
which the reader is already familiar. For instance, if the moving-average 
method is used, the moving average is subtracted from, instead of divided 
into, the original data; and the index from that point is constructed as 
usual, the final index being adjusted to total zero by the addition or sub- 
traction of a correction factor. Incidentally, it might be noted that any 
of the devices explained in Chapter 14 may be based on the subtraction 
method of computing seasonal. The link-relative method (described in 
Chapter 14) can also be adapted very easily as follows: (1) obtain link 



Cim\ 161 


CYCLICAL MOVEMENTS 


373 


differences by subtracting tlic preceding month from each moiith; (2) 
average these link diiferences, month b}'" month; (3) let the first-month 
link difference be zero, and chain the links by successive addition; (4) 
correct chain differences for (upward) trend hy successive subtraction of a 
correction factor; (5) adjust chain differences to total zero by addition or 
subtraction of a constant correction factor. 

Adjisslniciil for seasonal and ireiid. To serve as an illustration 
for most of the balance of this section, we shall use the data of magazine 
advertising linage, for which the trend was ascertained in Chapter 12 and 
for pari of which a moving seasonal index was computed in Chapter 15, 
The usual procedure consists, first, of removing the seasonal fluctuations, 
giving 

(T X S X C XI) -^'S^TXC XI 
and, next, eliminating trend to give 

{T XCX I) r ^ C X L 

We shall use the data of magazine advertising linage from January 1921 
to December 1953. The original, unadjusted data are shown in Chart 
16.4. The removal of seasonal variation is accomplished exactly as 
described for the data of consumption of newsprint, by dividing the origi- 
nal data by the seasonal index. This procedure is indicated in Table 
16.3. For magazine advertising, the seasonal indexes used were: (1) a 
couvstant index for the period 1921-1929, (2) a different constant index for 
1930-1937, (3) a moving seasonal index for 1938-1952, and (4) the 1952 
values repeated for 1953. The use of the 1952 seasonal index for 1953 
follows the usual practice when it is not possible (because of unavailability 
of subsequent data) to extend the moving seasonal index. The deter- 
mination of the 1943-1952 portion of the moving seasonal index was 
described in the preceding chapter, the index appearing in Table 15.3. 
The seasonal indexes were shown graphically in Chart 11.9. The 
deseasonalized data of magazine advertising are shown in Column 4 of 
Table 16.3 and in Chart 16.4. 

The next step consists of eliminating trend, the procedure being the 
same as that shown in Table 16.1, except that ive are now dealing with 
monthly data and must put the trend equation into monthly terms. 
N ote that while our present illustration concerns the years 1 921-1953, the 
trend equation was fitted to the period 1915-1949 and %vas extended 
through 1953. On page 276 the trend, in monthly terms, was found to be 


IT - 2.6028 + 0.004074X. 

Origin, July 1932. X units,*! month. 
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TABLE 16.3 

Adjustment €>f Bata of United States Magazine Advertising for Seasonal 
Variation and for Trend, 1921—1953 


(OrigiBal data, deseasonalized data, and trend values in thousands of agate lines.) 


Year 

and 

month 

(1) 

Original data 
TX SX CXI 

m 

Sea- i 

sonal 

index 

S 

(3) 

Deseasonaliaed 

data 

T X C X I 

[CoL (2) Col. (3)}-X 100 
(4) 

Trend 

values 

f 

(5) 

Cyclical-irregular 

percentages 

CXI 

Col (4) -i- Col (5) 
(6) 

1921 






January 

1,979 

84.8 

2,334 

2,041 

114.4 

February 

1,981 

97.2 

2,038 

2,045 

99.7 

March 

2,005 

106.6 

1,881 

2,049 

91.8 

April 

2,099 

118.2 

1,776 

2,053 

86.5 

May 

2,246 

113.5 

1,890 

2,057 

91 9 

June 

1,933 

102.6 

1,884 

2,061 

91 4 

July 

1,573 

81.6 

1,928 

2,065 

93 4 

August 

1,402 

72.0 

i 1,947 

2,069 

94.1 

September 

1,620 

91.2 

1,776 

2,073 

85 7 

October 

1,824 

111 9 

1,630 

2,077 

78.5 

November 

1.903 

114.1 

1,668 

2,081 

80.2 

December. . - . 

1,807 

106.3 

1,700 

2.085 

81.5 

1922 






January 

1,632 

84.8 

1,925 

2,089 

92.1 

February 

! 1,768 

97.2 

1,819 

2.094 

86.9 

March 

1 1,922 

106.6 

1,803 

2,098 

85.9 

April 

2,171 

US 2 

1,837 

2,102 

87 4 


2,215 

113.5 

1,952 

2,106 

92.7 

June 

2,046 

102.6 

1,994 

2,110 

94.5 

July 

1,705 

81.6 

2,089 

2,114 

98.8 

August 

1,566 

72.0 

2,175 

2,118 

102.7 

September 

1,940 

91.2 

2,127 

2,122 

100.2 

October 

2,470 

111.9 

2,207 

2,126 

103.8 

November 

2,466 

114.1 

2,161 

2,130 

101.5 

December. , ^ 


^m3_ 

2,318 

2,134 

igs . ^ 

1930 

January 

2,505 

75.1 

3,336 

2,481 

134. Is 

February 

3,024 

96.2 

3,143 

2,486 

126 5 

March 

3,416 

107.6 

3,178 

2,489 

127.7 

April 

3,877 

123.6 

3,137 

2.493 

126.8 

May 

8,639 

122.0 

2,983 

2,497 

1X9.5 

June 

3,354 

111.1 

3,019 

2,501 

120.7 

July 

2,451 

83.3 

2,942 

2,605 

117.4 

August 

2,067 

71.7 

2,869 

2,609 

114.3 

September 

2,598 

; 87.7 

2,962 

2,613 

117.9 

October 

1 3,021 

107.9 

2,800 

2,617 

111.2 

November 

! 3,042 

i 110.5 

2,753 

2,521 

109.2 

December 

2,820 

1 103.3 

2,730 

2,625 

108.1 

1931 






January. 

2,001 

75.1 

1 2,664 

2.529 

105.3 

February. ..... 

2,539 

96.2 

2,639 

2,534 

104.1 

March 

2,762 

107.6 

2,569 

2,538 

101.2 

April 

3,026 

123.6 

2,448 

2,642 

96.3 

May 

2,971 

122.0 

2,435 

2,646 

96.6 

June 

2,732 

111.1 

2,459 

2,550 

96.4 

— * 

1,998 

83.8 

2,399 

2.564 

93.9 

August. 

1,713 

71.7 

2,389 

2,558 

93.4 

September 

2,069 

87.7 

2,359 

2.298 

2.562 

2,566 

92.1 

October, 

2,480 

107.9 

89,6 

November. . . . , 

2,444 

2,170 

110.5 

2,212 

2,101 

2,570 

2,574 

86.1 

81.6 

December 

103.3 
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TABLE 16.3 (Concluded) 

Adjustment of Bata of United States Magazine Advertising for Seasonal 
Variation and for Trendy 1921--I9SS 


Year 

and 

month 

(1) 

Original data 
TXSXCXI 

(2) 

Sea- 

sonal 

index 

S 

(3) 

Deseasonalised 

data 

T X C X I 

[Col. (2) -f- Col. (3)1 X 100 
(4) 

Trend 

values 

T 

(5) 

Cyclicai-ir regular 
percentages 

O X I 

CoL (4) -5- Col. (5) 
(6) 

1950 






January 

3,261 

89.0 

3,664 

3,458 

106.0 

February 

3,868 

103 4 

3,741 

3,462 

108.1 

March 

4,270 

114.8 

3,720 

3,460 

107 3 

April 

4,482 

114.0 

3,932 

3,471 

113.3 

May 

3,853 

101 7 

3,789 

3,476 

109.0* 

June. ......... 

2.974 

79.0 

3,765 

3,479 

108.2 

July 

3,175 

80.0 

3,969 

3,483 

114.0 

August 

3,791 

98.2 

3,860 

3,487 

110 7 

September 

4,505 

116.0 

3,884 

3.491 

111.3 

October 

4,602 

120.2 

3,829 

3,495 

109.6 

November 

3,958 

104 1 

3,802 

3,499 

108.7 

Pecember 

3,106 

1 79.6 

3,902 

3,503 

111.4 

1951 






January. 

3,520 

88.7 

3,968 

3,507 

113.1 

February 

4,050 

102.6 

3,947 

3.511 

112,4 

March 

4,464 

115.5 

3,865 

3,515 

ilO.O 

April. 

4,531 

114.2 

3,968 

3,519 

112.8 

May 

3,926 

100.9 

3,891 

1 3,524 

110.4 

June, 

3,221 

79.4 

4,057 

3,528 

115.0 

July 

3,260 

79.7 

4,090 

3,532 

115.8 

August 

! 3,934 

98.0 

4,014 

3,536 

113.6 

September 

i 4,845 

118.2 

4,099 

3.540 

115,8 

October 

1 4,849 

120.2 

4,034 

3,544 
3,548 ’ 

113.8 

November 

I 4,129 

103.4 

3,993 

112.5 

Deeembe)^.^._.j_^.^ 

1 _3^346 


4.225 

J^552^ 

_ _ _ 118 9 







January 

3,466 

88.0 

3,939 

3,556 i 

, 110.8 

February 

3,985 

101,5 

3,926 

3,580 1 

^ no 3 

March 

4,855 

117.7 

4,125 

3,564 ! 

115.7 


4,468 

114.3 

3,909 

3,568 

109.6 

iSrHBliiiWWi 


100.5 

4,073 

3,572 

114.0 


3,213 1 

79.9 

4,021 

3,576 

112.5 

liWlHlIIIMWWi 

3,133 i 

79.2 1 

8,956 

3,581 

110.6 

August 

3,960 

97.9 

4,045 

3,585 

112.8 

September ■ 

4.798 

119.2 

4.025 

3,589 

112.1 

October ' 

4,898 

120.0 

4,082 

3.593 

113.6 

November 


103.0 

4,174 

3.597 

116.0 

December 

3,162 

78.8 

4,013 

3,601 

111.4 

195S 



'4,167 



January 

3,667 

88.0 

3,605 

115.6 

February 

4,251 

101.5 

4,188 

3,609 

116.0 


4,991 

4,699 

117,7 

4,240 

3,613 

117.4 

April,.... . .... 

114.3 

4,111 

3,617 

H3.7 

May. ......... 

4,445 

100.5 

4,423 

3.621 

122.1 

June 

3,360 

79,9 

4,205 

3,625 

116.0 

July. ......... 

3,205 

79.2 

4,047 

3,629 

111.5 

August... 

4.136 

97.9 

4,225 

3,634 

116.3 

September. .... 

4,965 

119.2 

4.165 

3.638 

114.5 

October 


120.0 

4.358 

3.642 

119.7 

November. .... 

4,406 

103.0 

4,278 

3,640 

117.3 

December 


78 8 

4,011 

3,650 

109 9 


lilagasjiae advertisiag linage from various issues of the Survey <?/ Current BusdneM* Seasonal indexJ 
fixed for I92i“li20 and for 1930-1937 from Table 116 of the first editioa of this text, changing for 
1938-1942 from worksheets not shown, changing for 1943-19S2 from Table 15.3, 1953 same W 1952. 
Trend valuer from the equation given on page 373» 





; see also 
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Now, the equation Just given is in terms of millions of agate lines, wMle 
the monthly data of Table 16.3 are in terms of thousands of agate lines. 
Therefore, the equation becomes 

Fc == 2,602.8 + 4.074Z 

with the same origin and X units as before. 

The trend values shown in Column 5 of Table 16.3 were obtained from 
this equation. Now, the deseasonalized values in Column 4 ‘of Table 
16.3 are each divided by the corresponding trend value [(T X C X •^) 

T = C X I] to produce the cyclical-irregular values in Column 6 of the 
table. These cyclical-irregular values are shown in Chart 16.5. It is 
interesting to note that the values shown in Column 6 of Table 16.3 are 
percentages, not thousands of agate lines. When seasonal movements 
are eliminated by dividing by a seasonal index (which is a series of per- 
centages), the deseasonalized data are always in the same units as were the 
original data. Trend, however, is in terms of the original units, so that 
when the trend of a series is eliminated by dividing, the resulting figures 
are percentages. 

In Table 16.3 the cyclical-irregular movements were obtained by 
eliminating, first, seasonal variation and then trend. In symbols, the 
procedure was 

{TXSXCXI)-^S^TXCXI) the deseasonalized data, and 
(T X C X I) T ^ C X Ij the cyclical-irregular movements. 

If it ivere desirable to do so, we could, of course, eliminate first trend and 
then seasonal variation, thus: 

{TXSXCXI)'^T — SXCXif the data adjusted for trend, and 
{IS X C X I) S ^ C X I, the cyclical-irregular movements. 

Another possibility consists of multiplying together the trend and sea- 
sonal values (the seasonal percentages being used as decimal ratios) and 
eliminating both of those movements at the same time. In symbols, 
this is 

{T X S X C X I) {T X S) C X If the cyclical-irregular movements. 

Table 16,4 illustrates these three possible procedures for magazine adver- 
tising linage for 1952. Note that the final results by the three methods, 
which are shown in Column 6 of each part of Table 16.4, either agree 
exactly or occasionally differ by 0.1 because of rounding. 

Of the three procedures for adjusting for seasonal variation and trend, 
the one first described is most frequently used, since it is often desired 
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TABLE 16.4 


Three Methods of Obtaining Cyclical -Ir regular Movements of United States 
Magazine Advertising for 1952 
I. AdJustmGnt for seasonal variation and then for trend. 


Month 

(1) 

Original 

data 

Txsxaxi 

(2) 

Sea- 

sonal 

index 

8 

(3) 

Desea&onalized 

data 

T X C X 1 

[Col, (2) -4- Col. (3)} X 100 
(4) 

Trend 

values 

T 

(5) 

Cyclical-irregular 

percentages 

C X / 

Col. (4) ->■ Col. (5) 
C6) 

January 

3,466 

88 0 

3,939 

3 , 550 

110.8 

February . . . 

3,985 

101.5 

3,926 

3,560 

110.3 

March 

4,855 

117.7 

4,125 

3,564 

116.7 

April 

4,468 

U4 3 

3,909 

3,5GS 

109.0 

May 

4,093 

100.5 

4,073 

3,572 

: 114.0 

June 

3,213 

79 9 

4,021 

3,576 ' 

112.3 

July. .... 

3.133 

79,2 

3,956 1 

3,5S1 

110.6 

August 

3,960 

97.9 

4,045 

3.585 

112.8 

September . . 

4,798 

119.2 ! 

4,025 ! 

3,589 

112.1 

October 

4,898 

120 0 

4,082 I 

3.593 

113.6 

November 

4.299 

103 0 i 

4,174 ! 

3,597 

116.0 

December. . 

3,162 

78 8 1 

4,013 

3,601 

111.4 


11. Adjustment for trend and then for seasonal variation. 


Month 

(1) 

Originai 

data 

TXSXCXI 

(2) 

Trend 

values 

f 

(3) 

Fer cent of 
trend 

SX C XI 

Col (2) -f- Col. (3) 
(4) 

Sea- 
! sonal 
index 

8 

(5) 

Cyclical-irregular 

percentages 

O X I 

\ CoL (4> Co!. (5) 

1 (6) 

January 

3,466 

3,556 

97.5 

88.0 

110.8 

February 

3,985 ; 

3,560 

1U.9 

lOi.5 

110.2 

March 

4,855 

3,564 

136.2 

117-7 

115.7 

April 

4,468 

3,568 

125.2 

114.3 

109.5 

May 

4,093 

3,572 

114.6 

100.5 

114.0 

June 

8,213 

3.576 

89.8 

79.9 

112.4 

July 

3,133 

3,581 * 

87.5 

79.2 

110.5 

August 

3,960 

3,585 ' 

110.5 

97.9 ! 

112.9 

September ! 

4,798 

3,589 i 

133.7 

119.2 

112.2 

October 

4,898 

3,593 

130.3 

120.0 

113.6 

November ^ 

4,299 

3,597 

119.5 

103.0 

116.0 

December ; 

3,162 

3,601 i 

87.8 

78.8 

111.4 


in. Adjnstment for combined trend and seasonal movements. 


Month 

(1) 

1 Original 

data 

TXSXCXI 

<2) 

Trend 
■ values 
: T 

m 

Seasonal 

index 

1 ^ 

! (♦) 

^’Normal” values 1 
T* X <5 

CoL (3) X Col. (4) 

■ (5) 

: Cyclical-irregular 
percentages 
' <7X1 

I Col (2) -i- CoL (6) 
i (6) 

January 

3,460 

3,556 

88.0 

3,129 

1 ud.8 

February 

3,985 

3,560 

101.5 

3,613 

1 110.3 

March 

4,855 

3,564 

117.7 

4,196 

115.7 

April 

4,468 

3,568 

114.3 

4,078 

109. § 

May 

4,093 

3,572 1 

100.5 

3,590 

114. 0 

June 

3,213 

3,576 j 

79.9 

2.857 

112.5 

July . 

3,133 

3,581 i 

79.2 

2.836 

110.6 

August 

3,960 

3,685 I 

97.9 

3.510 

112,8 

September 

4,798 

3,689 ' 

119.2 

4,278 

112.2 

October ... 

4,898 

3,693 ^ 

120.0 

4,312 

US. 6 

November ; 

4.299 

3,597 

103.0 

3,705 

116.0 

December 

3,162 

3,001 

78.8 

2,838 

111.4 


Data from sources givea below Table 16*3* 
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to study a series adjusted for seasonal variation as well as to observe the 
cyclical-irregular movements. Since one rarely is interested in adjusting 
a monthly series for trend alone, the second procedure is not often used. 
If the sole purpose of the analysis is to obtain the cyclical-irregular move- 
ments (either as a final objective or as a step toward getting the cyclical 
movements), the third method shown in Table J6.4 will be slightly loss 
time-consuming than either of the others, since most types of calculating 
machines can more quickly perform the series of multiplications which 
replaces one of the two vseries of divisions present in the other methods. 

How^ever the c^^clical-irregular movements are obtained, those values 
are often referred to as percentages of ^^normal^^ The term ^^normaF' 
is frequently used in economics, business, psychology, statistics, and in 
other fields, and it is not always used with the same meaning. In this 
instance, normal refers to the combined trend and seasonal movements 
of a series, the thought being tliat from a long-run point of view it is 
normal for an industry to increase (or decrease) in some steady fashion, 
and that from a siiort-run viewpoint it is normal for seasonal variation 
to be present. Taken together, both movements are normal.^’' 

SmoolMiig irregular niovemenls* The interpla}’^ of a multitude 
of forces, other than those already eliminated, is largely responsible for 
the irregular movements which are usually to be seen in the curve of a 
series adjusted for seasonal variation and trend. The irregular fluctua- 
tions in magazine advertising linage are apparent in Chart 16.5. Occa- 
sionally, irregular fluctuations may occur because the seasonal index 
%vhich was used was not as good as might be desired. Earlier consider- 
ation of the seasonal index for magazine advertising linage has indicated 
that it was satisfactory. 

Irregular fluctuations cannot be completely eliminated from a series 
without the accompanying danger of over-smoothing. However, the 
irregular movements can be smoothed, so as to bring the cyclical move- 
ments into clearer relief, by the use of a short-term moving average. 
From an examination of Chart 16.5 it appears that most of the irregular 
movements are of one month^s duration, although occasionally, as in late 
1927 and early 1928, they appear to last longer than one month. To 
smooth out these movements, we could use a two-month moving average, 
except that the values of such an average should be plotted between each 
pair of months. If we were to average three months, the average would 
appropriately fall opposite the center month, but we would encounter 
another serious predicament: if the first and third months were high and 
the second month low, the resulting average would be high; if the first 
and third months were low and the second month high, the average 
would be low.^ A three-month average would therefore sometimes intro- 
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duce reverse fluctuations into the series. Both of the foregoing diffi- 
culties may be overcome by using a three-month moving average weighted 
1, 2, L which is, of course, a centered two-month moving average. Table 
16.5 indicates how this average is obtained: first a three-month moving 
total "weighted 1,2, 1 is gotten for the cyclical-irregular values, and then 

TABLE 16.5 

Computation of Cyclical Movements for Data of United States Magasine 

Advertisi rtg^ 1921-1 953 


Year and 
month 

(1) 

Cycli- 

eai- 

irregu- 

iar 

per- 

cent- 

ages 

CXI 

(2) 

Tliree- 
month 
moving 
total 
weighted 
1, 2, 1 of 
CoL (2) 

(3) 

Cyoli- 

cal 

per- 

cent- 

ages 

C 

CoL (3) 
4- 4 
(4) 

1921 




January .... 

114.4 



February. , . 

99.7 

405.6 

101.4 

March 

91.8 

369.8 

92.4 

April 

86,5 

356.7 

89.2 

May 

91.9 

361.7 

90.4 

June 

91.4 

308.1 

92.0 

July 

93.4 

372.3 

93.1 

August 

94.1 

367.3 

91.8 

September. . 

85.7 

344.0 

86.0 

October. . 

78.5 

322.9 

80 7 

November. . 

80.2 

320.4 

SO.l 

r )i3eember . . . 

81.5 

335 3 

83.8 

1922 




January .... 

92. J 

352.6 : 

88.2 

February. . . 

86.9 

351.8 

88.0 

March 

85.9 

340.1 

80.5 

April 

87.4 

353.4 ^ 

88.4 

May....... 

92.7 

370.3 : 

94.1 

June ' 

94.5 

380.5 ! 

95.1 

July i 

98.8 i 

394.8 i 

98.7 

August ' 

102.7 i 

404.4 

101.1 

September . . | 

100,2 1 

400.0 

101.7 

October — , 

103.8 1 

‘ 409.3 

102.3 

November. . 

101.5 : 

: 415.4 

103.8 

December . i 

108.6 i 

1 434 . 1 

1 108,5 


Cyciicai-irregular percentages froiii Table 16.3. 


Year and 
month 

(1) 

Cycii- 

eal- 

irregu- 

lar 

per- 

cent- 

ages 

CXI 

(2) 

Three- 

month 

moving 

total 

weighted 
1, 2, 1 of 
Col. (2) 

(3) 

Cycli- 

cal 

per- 

cent- 

ages 

C 

Coi. (3) 
-f- 4 
(4) 

1952 




January .... 

110.8 

450.8 

112.7 

February . . . 

110 3 

447.1 

111.8 

March 

11*5.7 

451.3 

112.8 

April 

109.6 

448.9 

112.2 

May 

114. 0 

450.1 

112.5 

June 

112.5 

449.5 

112.4 

July 

110.5 

446.3 

111.6 

August . ... 

112.8 

448.2 

112.0 

September . . 

112 1 

450.6 

112.6 

October 

113.6 

455.3 

113.8 

November . 

UC.O 

457.0 

114.2 

Deccniber . . 

111.4 

454.4 

113.6 

1953 




January .... 

116.6 

458. 6 

114.6 

Febnmry . . . 

116.0 

465.0 

116.2 

March 

1 117.4 

404.5 

116.1 

April 

; 113.7 

466.9 

116.7 

May ....... 

122.1 

473,9 

118.5 

June 

1 110.0 

465.6 

110.4 

July 

i 111.5 

455.3 

113.8 

August .... 

1 116.3 

458.6 

J14.6 

September . . 

i 

465.0 

110.2 

October 

1 119.7 

471.2 

117.8 

November . . 

; 117 3 

464.2 

116.0 

December . . . 

1 109.9 

. . , 



the moving-total values are each divided by 4 to arrive at the moving 
average. The moving totals should be obtained by use of an adding 
machine, each total being obtained separately and not by use of successive 
subtotals as was done when we computed a 13-month weighted moving 
total in Table 14.5. The moving averages should be gotten from the 
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moving totals by multiplying by 0.25, rather than by dividing by 4, since 
most calculating machines will produce the results faster when a constant 
multiplier is used. Note that the figures in the second column of Table 
16.5 are the same as those in Column 6 of Table 16.3. In actual practice, 
Columns 3 and 4 of Table 16.5 would be included as additional columns 
of Table 16.3. Two separate tables are shown here because of the diffi- 
culty of showing so large a table on the printed page of this text. Note 
that there will be no three-month moving-average figure for the first 
month and the last month of a series. 

The result of smoothing the cyclical-irregular values by the use of a 
three-month moving average weighted 1, 2, 1 is shown in Chart 16.6. 
It is clear that this curve is much smoother than the curve of Chart 16.5, 
although there are a few spots where the moving average was of too short 
duration to smooth out the irregular fluctuations completely. Irregular 
movements are not often entirely eliminated from a series. Their com- 
plete elimination may call for freehand smoothing or use of a moving 
average of longer duration than three months. In any event, the smooth- 
ing process must not hide the turning points of the cyclical movements. 
Since a four-month moving average would have the same shortcomings 
as a two-month moving average, the practicable moving average, next 
longer in duration than the one used in Table 16.6, would be a (weighted) 
five-month moving average. Five-month moving-average values are 
set opposite the third month of each set of five months. The months ^re 
often weighted 1, 2, 4, 2, 1, which gives greatest weight to the center 
month and least weight to the end months. Since this weight pattern 
totals 10, the moving averages may be computed from the moving totals 
without use of a calculating machine. 

The irregular imvemeniB, The irregular movements themselves may 
be obtained by dividing the cyclical-irregular values, shown in Column 2 
of Table 16.5, by the cyclical values, \vhich are in Column 4 of the same 
table. The computation of the irregular movements is not shown, but 
Chart 16.7 shows these, month by month, and Chart 16.8 give a fre- 
quency distribution of the irregular variations. If the irregular move- 
ments were of a random character, they might be expected to form a 
normal curve. Although the curve of Chart 16.8 is nearly symmetrical 
(^1 - 0.0005), it is leptokurtic, having ^2 - 3.90. If the deviation of 
— 11.1, not shown in Chart 16.8, is included in the computations, both 
skewness and leptokurtosis are increased. This is the sort of frequency 
distribution to be expected for the irregular movements of a time series, 
since, in addition to minor fluctuations, there are ordinarily others that 
are episodic in nature and the effects of which may continue (or cumulate) 
over several months. The data of magazine advertising are rather ^^ell 
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behaved” in this respect, the deviations continuing on the same side of 
the zero line® of Chart 16.8 for five months at a time only once, for four 
months at a time only three times, and for three months at a time only 
eight times. 

NUMBER 
OF MONTHS 



Cliart 16.8* Frequency Distrihution of Irregular Movements in Magazine 
Advertising in Ike United States, 1921-1953. The irregular movements are 
7 = C X / C and are expressed as percentage deviations. Data computed from 
Columns 2 and 4 of Table 16.5 and from worksheets (not shown) for the years omitted 
from that table. 

Comparing cyclical movements. One reason for wishing to isolate 
cyclical movements in a time series is the desire to compare them with the 
cyclical movements in one or more other series. Occasionally it may be 
thought that one series more or less consistently precedes another at its 
cyclical turning points.*^ However^ when two series differ in regard to 
the amplitude of their fluctuations, some difficulty is experienced in com- 
paring the timing of those fluctuations. The more marked the difference 
in amplitudes, the more important it is to make some sort of an adjust- 
ment for that difference. 

As an illustration we shall use the Index of Durable Manufactures and 
the Index of Nondurable Manufactures for January 1946-December 
1953, both of which are issued by the Board of Governors of the Federal 

^ This is not easy to see from the chart. The counts were made from the data upon 
which the chart is based. 

^ Lead-lag relationships are discussed in Chapter 22. 
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Reserve System. Chart 16.9 shows these two series, adjusted for trend 
and seasonal movements, with irregular fluctuations smoothed, and 
expressed as cyclical deviations. Cyclical deviations give the same curve 
as cyclical percentages; the values are merely expressed differently, for 
example, 102.5 is +2.5, 101.2 is +1.2, 100 is 0, 98.3 is - 1.7, 96.4 is -3.6, 
and so forth. Although the two series in Chart 16.9 are not markedly 
different in regard to their cyclical fluctuations, it is clear that the Index 
of Durable Manufactures shows greater amplitude than does the Index of 
Nondurable Manufactures, 

PEK CENT 



Chart 16.9. Cyclical Deviations of Federal Reserve Index of Froduction of 
Durable Manufactures and of Index of Nondurable Manufactures, 1946- 
1953. For sources of data, see note to Table 16.6. 

One possible method of making the amplitudes of the cyclical move- 
ments more nearly alike consists of using different vertical scales for the 
two series. While this is a simple solution, it is not easy to decide just 
what relationship the two vertical scales should bear to each other; for 
example, if the maximum fluctuations were to be made to cover the same 
vertical distances, the series of greater amplitude might be compressed 
too much in some portions. A more satisfactory procedure consists of 
expressing each series in terms of its own standard deviation and employ- 
ing but one vertical scale. 

Table 16.6 indicates the procedure for computing the value of s for the 
Index of Nondurable Manufactures. The formula used for obtaining s 
is the one employed for ungrouped data in Chapter 10. As shown below 
Table 16.6, s = 3,249 for the Index of Nondurable Manufactures. 
Similar computations give s = 7,502 for the Index of Durable Manufac- 
tures. The last column of Table 16.6 shows ^the cyclical deviations for 
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Computation of s for Cyclical Deviations of Federal Reserve 
Index of Production of Nondurable Manufactures and 
Cyclical Deviations in Terms of s, 19i6-19S3 


(The original index figures had 1047-1949 = 100.) 


Year 

Cyclical 

Squares of 

Cyclical deviations 

and 

deviations* 

Col. (2) 

in terms of s 

month 

y 


Col. (2) -4- s 

(1) 

(2) 

(3) 

(4) 

1946 




January 

+0.1 

0.01 

+0.03 

February 

+1,9 

3.61 

+0 58 

March 

+1.6 

2.56 

+0 49 

April 

+0 3 

0.09 

+0.09 

May 

-0 8 

0.64 

-0 25 

June 

-2.2 

4.48 

-0 68 

July 

-3 0 

9.00 

-0.92 

August 

-2.0 

4 00 

-0.62 

September 

-0.7 

0 49 i 

-0 22 

October 

+0.8 

0.64 j 

+0.25 

November 

+2 6 

6.76 

+0.80 

December , 


9.61_ 

;+Q.95 _ 

" ’ ’ 1953 




January 

+0 8 

0 64 

+0.25 

February 

+1.0 

1.00 

+0.31 

March 

+1.8 

3.24 

+0 55 

April 

+3.1 

9 61 

+0.95 

May 

+3.7 

13.69 

+ 1 14 

June 

+3,0 

9.00 

+0 92 

July 

+2.0 

4.00 

+0 62 

August 

+0 4 

i 0 16 

+0.12 

September 

-l.l 

1.21 

-0 34 

October 

1 -2.2 

4.84 

-0.68 

November 

-3.6 

12.96 

-l.ll 

December 

I -5.3 

28.09 

-1.63 

Total 

1 -5.6 

1,013.92 



* C.vciical deviations may be expected to total very nearly zero if the trend was fitted 
by least squares to data covering the same period as the data under consideration. 
Since the trend for the Index was fitted to a period a few' months longer than that 


shown in the table, Sy 9 ^ 0 and the correction factor 
computing s. Omitting the correction factor gives s == 3.250, 


(f)’ 


has been employed in 



3.249. 


Desea^onalized data from Federal Reserve BuUeHn, December 1953, pp* 1326-1327 
and mimeographed releases, trend and irregular movements remove4 by the writers. 
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the Index of Nondurable Manufactures expressed in terms of s = 3,249. 
Similar computations were made for the Index of Durable Manufactures. 
Both series are shown in Chart 16.10, where it is clear that the ampli- 
tude of the fluctuations of the two series is now more nearly the same. 
Although the cyclical fluctuations of time series cannot be expected to be 
distributed normally,® it is interesting to note that the values for both 
series are within ±3 standard deviations. This will not always be true; 
values of ±4, or even more, are occasionally found. 


STANDARD 

OCVIATIONS 



IS46 1947 <948 <949 1950 I9St 1952 1953 


Chart 16.1(1. Cyclical Deviations, in Units of Their Standard Deviations, 
of In<!ex of Production of Durable Manufactures and Index of Production of 
Nondurable Manufactures, 1946-1953. For sources of data, see note to Table 16.6. 


A chart such as Chart 16.10 is sometimes referred to as a cycle chart, 
since its object is to facilitate the comparison of cyclical movements.® 
The vertical scale of such a chart, when seen in a nontechnical publication, 
may be labeled cycle values with no specific mention of the fact that 
the values are in terms of b. The omission is ordinarily an intentional 
one, since readers of the new^spaper or magazine might not understand 
the meaning of s. 

A more striking illustration of two series with fluctuations of different 
amplitude, but dealing with annual data, is given in Charts 22.4 and 22.7, 
which show data of production of by-product and of beehive coke, first as 
deviations from trend and then as deviations from trend in terms of a. 


^ The normal curve is discussed In Chapter 23. The characteristic of s which la 
referred to here was mentioned on pages 219-220. 

® An interesting application is shown in “Southern Maryland a Tobacco Economy,” 
University of Maryland, Bureau of Business and Economic Research, Studies in 
Business and Economics, March 1954, pp. 16 and 17,^ 
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OTHER METHODS OF ESTIMATING CYCLICAL 

MOVEMENTS 

Although the residual method of isolating cyclical movements involves 
extensive computation, it is the most widely used procedure. Brief 
mention will be made of three other methods. 

Direct analysis. One possibility consists of expressing each month 
as a percentage of the corresponding month of the preceding year. This 
operation results in roughly eliminating seasonal variation and secular 
trend.^ However, some residual trend will remain, since the percentages 
will tend to be above 100 if the trend is upward, but below 100 if the trend 
is downward. Even if the residual trend is eliminated, the resulting 
“ cycles are somewhat different from the sort of fluctuations previously 
discussed; the percentages represent cyclical changes rather than the cycli- 
cal level. Thus, a year (or ether) period may be high, not because it was 
at a high level, but because the preceding jmar was especially low. This 
method has the advantage of paralleling the business man's often 
expressed comparison of a given month with the same month a year ago. 

A variation of the direct method expresses each month as a percentage 
of the average for the corresponding month for several previous years. 
The liumber of years to consider depends upon the length of the cycles 
in the series, the average length of the cycles often being used. This 
involves a decision concerning the length of the individual cycles, before 
the cyclical movements have been obtained. Furthermore, it is rare that 
cycles in economic series are uniform in duration (or amplitude), so that 
rather serious distortion of the data may still result. 

Harmoxrie analysis. When the cyclical movements of a series are of 
about the same duration and amplitude, a sine-cosine or similar type of 
curve, having regular undulatory movements, may be fitted. Such a 
curve may be fitted to the cyclical-irregular data or to the data after the 
irregular movements have been smoothed. Since series having cyclical 
movements of fairly regular periodicity and amplitude are rare in the 
social sciences and in business, we shall not discuss the fitting of a har- 
monic series in this volume. The procedure for fitting a sine-cosine curve 
was described in the first edition of this text,^ pages 554-560. 

Reference-cycle analysis. When several time series are being 
studied, it would, of course, be possible to compare the cyclical movements 

M. A, Brambaugii, Direct Method of Determining Cyclical Fluctuations of 
Economic Daia, Prentice-Hall, Inc., New York, 1926. 

® Bee also H. L, Bieta (editor), Handbook of Mathematical Statistics, lluiig ton 
Mifflin Co., Boston, 1924, Ch. XI, ^*Periodogram Analysis'’ by W. L. Crnm. Moving 
^ arcs, wkich are more dexible, are described in Max Sasuly, Trend Analysis of Staiistics, 
Brookings Institution, Washington, 1934, Ch. IX. 
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of each series with the cyclical movements of every other series under 
consideration. A procednroj involving reference dates/'’ has been 
designed by the National Bureau of Economic Hesearch as a device which 
allows one not only to compare each series with a standard set of dates 
and to observe the behavior of individual series during expansion and 
contraction of general business, but also to compare the results for the 
various individual series. The following description is oversimplified, 
but it should give the reader a general idea of the method/ 

The first step is the selection of the reference dates, which are the dates 
of the peaks and troughs of business cycles. To avoid any possible mis- 
understanding, it may be well to point out that business cycles” means 
the cyclical fluctuations in general business activity, not the cycles in any 
one field or area. The reference dates, which are applied to all indi- 
vidual series, were chosen after examination of a large number of economic 
time series and after study of ^Hhe contemporary reports of observers of 
the business scene/’ 

The next step consists of processing the data of the individual series in 
order to obtain a cyclical pattern for each series for the period between 
each two successive reference troughs. Each period is the same for all 
series, enabling one to compare the results for the various series. The 
processing of each series proceeds as follows: 

(1) The data are adjusted for seasonal variation. 

(2) The seasonally adjusted data are divided into ^^reference-cycle 
segments,” these segments corresponding to the intervals between adja- 
cent reference troughs. 

(3) For each segment, the monthly values are expressed as percentages 
of the average of all the values in the segment. These are '^reference- 
cycle relatives.” Note that, as a result of this step, all series, no matter 
what the original unit, are in percentage form. Note also that this step 
eliminates inter-cycle trend, since the average of the relatives for each 
cycle is 100; but it does not eliminate intra-cycle trend. The inclusion 
of intra-cycle trend is regarded as desirable, since it ''helps to reveal and 
to explain what happens during business cycles.” 

(4) Each reference-cycle segment is now broken into nine stages, to 

® A more adequate description may be found in G. H. Moore, StatuUcd Iniimton of 
Cyclical Eevimis and Eecessiom, Occasional Paper No. 31, National Bureau of Eco- 
nomic Research, Inc., New York, 1950, pp. 3~12. Further details of the method are 
in W. C. Mitchell, What Happens During Business Cycles: A Progress Seportp National 
Bureau of Economic Research, Inc., New York, 1951, pp. 9-50. Brief quotations in 
the above description are from these sources. A full description and analysis of the 
merits and limitations of the method is contained in the source cited in note 10, 
below. 
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t Residential Building Contracts, Floor Space 5 Refined Copper Stocks 

2 Industrial Production Index, FRB 6 Bond Sales, New York Stock Exchange 

3 Railroad Locomotive Shipments 7 Agricultural Marketings Index 

4 Business Failures, Liabilities, Manufacturing Companies 

^ Pattern for cycle 

Average pattern, 5 cycles 
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Chart 16.11 (Continued). 
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correspond to the same nine stages in the business cycle, and the reference- 
cycle relatives are averaged for each of the nine stages. The nine stages 
are identified as: 

I. The three months centered on the initial trough. 

11. The first third of the expansion period. 

III. The second third of the expansion period. 

IV. The last third of the expansion period. 

V. The three months centered on the peak. 

VI. The first third of the contraction period. 

VII. The second third of the contraction period. 

VIIL The last third of the contraction period. 

IX. The three months centered on the terminal trough. 

The nine stage averages for each reference-cycle segment serve to reduce 
the erratic movements in a series and give a reference-cycle pattern for 
the particular series under consideration. 

The results of the preceding operations are shown graphically in Chart 
16.11 for seven series. In this chart, the reference troughs are indicated 
by r, the reference peaks by P. This chart shows not only the pattern 
for each series for each cycle, but also the average pattern for each series 
over the five cycles. The divisions at the top of the chart show the nine 
stages for each reference cycle, while those at the bottom (which are alike) 
show the nine stages as averaged for the five cycles. 

The National Bureau of Economic Research also makes use of specific- 
cycle analysis. This differs from the procedure already described in that 
the turning points, the stages, and the pattern are determined from each 
individual series, itself. We shall not give any further attention to 
specific-cycle analysis in this text, except to point out that charts may be 
prepared for a particular series, showing both specific cycles and reference 
cycles in order that the two may be compared, The cycles may also be 
compared by other means, including the computation of ‘Teads^^ and 
and an “index of conformity.^' 


For a discussion of specific cycles, see A. F. Burns and W. C. Mitchell, Memiuing 
Business ctjdes, National Bureau of Economic Research, Inc., New York, 1946, A 
chart showing specific cycles and reference cycles may be found on p. 377, 



Symbols Used in Chapter 17 


p: price of a commodity. 

Pi price index number, 
g: quantity of a commodity. 

Q: quantity index number. 

n: a subscript indicating a given period or the current period, 
o: a subscript indicating the base period. 

S: upper-case Greek sigma, meaning “take the sum of.^^ 

Numerical subscripts written 48,53, for example, may accompany either 
P or Q and indicate a 1953 index on a 1948 base. When written 
53 or 48-50, for example, such subscripts may appear with p or q and 
indicate that the price or quantity referred to is for the year specified 
or is the average (or total) for the years separated by the hyphen. 
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CHAPTER 17 

Fundamentals in Index Number 
Construction 


MEANING AND USES OF INDEX NUMBERS 

Index numbers are devices for measuring differences in the magnitude 
of a group of related variables. These differences may have to do with 
the price of commodities, the physical quantity of goods produced, 
marketed, or consumed, or such concepts as ^intelligence, ^^beauty,^^ or 
^'efficiency.” The comparisons may be between periods of time; between 
places; betw^een like categories, such as persons, schools, or objects. 
Thus, we may have index numbers comparing the cost of living at differ- 
ent times or in different countries or localities, the physical volume of 
production in different years, or the efficiency of different school systems. 
A few uses to which index numbers are put are described below. 

(1) Perhaps the best-known type of index is that of the change in price 
level over a period of time. Such indexes have been in use longer and 
currently are the most numerous. One use of price index numbers, with 
which the reader is already familiar, is that of deflating a value series in 
order to convert it into physical terms. Referring back to Table 11.1, 
we find that hourly wages were reduced to hourly real wages by dividing 
by the Consumer Price Index. Similarly, we might wish to convert a 
time series representing value of construction contracts awarded to a 
physical basis by deflating with an index of construction costs. 

Price movements may be studied in order to discover their cause, or 
their effect on the economic community. In order to study such eco- 
nomic relationships, it is customary to compare changes in the price level 
with changes in other series, such as gold, bank reserves, bank deposits, 
bank debits, and the physical volume of production. Studies of this 
nature may involve, not only the average change in price relatives, but 

The first draft of this chapter %vas prepared by Dr. Jaifies D. Paris. 



Chap. 17] 


INDEX NUMBER CONSTRUCTION 


395 


also: (a) dispersion of price relatives; (b) shape of frequency distributions 
of price relatives; (c) alterations in the relative positions of such per- 
centages (displacement of prices); (d) changes in price with changes in 
quantity offered for sale; (e) changes in volume of purchases or production 
with changes in price (elasticity of demand or supply) ; (f ) frequency with 
which different prices change; (g) magnitude of price changes with 
changes in demand. 

Changes in the price level may be measured in order to control them. 
Thus, the increase in official price of gold in 1933-1934 was in part an 
attempt to raise the general price level. If index numbers showed the 
price level to be higher after the price of gold was raised, this result might 
be taken as an indication that the gold policy was effective. 

Occasionally, governmental influence is exercised not to raise, lower, or 
stabilize the price level, but to raise one group of prices relative to 
another. Thus, the United States Government has considered various 
devices, and tried some, to raise agricultural prices to an official '^parity 
with industrial prices. A parity index is described in Chapter 18. 

In increasing numbers since World War II, collective-bargaining agree- 
ments have been made which provide automatic wage adjustments 
resulting from changes in consumer price index numbers. A few business 
contracts also have been effected to make similar adjustments, based 
upon wholesale price indexes. Such adjustments are generally referred 
to as escalator (or escalation) clauses/^ These agreements or contracts 
ordinarily contain two sections: one specifies the index to be used, usually 
one made by the United States Bureau of Labor Statistics; the other 
defines the base amount which is to be multiplied by the percentage 
changes in the index. In most wage contracts having escalator clauses, 
provision is made that any downward adjustment shall not be below the 
original base amount. The Bureau of Labor Statistics has estimated 
that there are approximately 3,500,000 workers who are covered by con- 
tracts containing escalator clauses tied to the Consumer Price Index 
issued by that Bureau. 

Illustrations of average price comparisons between different regions 
are not common. It is very difficult to make such comparisons, since the 
relative importance of goods produced and/or consumed in the different 
places varies so widely. However, the National Industrial Conference 
Board at one time compiled an index of the cost of living in twelve indus- 
trial cities, with the object of comparing the ^^differences in the cost of 
maintaining an established standard of living between different regions 
and between different cities in the same region. Also, the Works Progress 
Administration issued a report dealing with intercity differences in cost 
of living in 59 cities. 
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(2) Several organizations compile indexes comparing physical changes 
over a period of time. These relate to the physical volume of trade, 
industrial production, factory production, sales, stocks of goods, imports 
and exports, and so forth. We have already used such indexes in our 
analysis of time series. They are extremely useful for the historical 
study of secular trends, seasonal variations, and business cycles, and are 
indispensable for persons who wish to keep abreast of current business 
conditions. 

(3) Forecasting indexes are compiled by most forecasting organiza- 
tions. Although many of the indexes seem sound in theory, and in prac- 
tice when applied to periods before they were actually used, unfortunately 
most of them do not work when put to current use. Some statistical 
aspects of forecasting are discussed in Chapter 22. 

(4) Other varieties of indexes are diverse in nature and few in number. 
As an illustration of one type an index of school efficiency may be men- 
tioned. Following the pioneer work of Leonard P. Ayres, who in 1920 
published index numbers of the rating of state school systems, a number 
of similar studies have been undertaken. Among the factors most com- 
monly combined in the general index are: (a) school days per year; (b) 
per cent of school population attending schools daily; (c) ratio of high 
school eni’ollment to total enrollment; (d) average expenditure per pupil 
ii. average daily attendance; (e) average expenditure per pupil for pur- 
poses ether than salaries; (f) average salary of teachers. 

An index number is obtained by combining a number of variables by 
means of a total or an average. This statement will be clarified by 
reference to Table 17.1. In Column 2 there is a single price series of 

T.A.BLE 17.1 


Prices and Price Relatives of Florida Oranges 
Compared with Citrus Fruit Price Index^ 
1948-1953 


Year 

(1) 

Florida oranges | 

Citrus fruit 
price index* 
1948 « 100 
(4) 

Price 
per box 
(2) 

Price, relative 
to 1948 
(3) 

1948 ^ 

13.41 

100.0 

100.0 

1949 

4.38 

128.4 

124.2 

1950 

5.00 

146.6 

131.8 

1951 

4.45 

130.5 

123.9 

1952 

3.81 

111.7 

121.3 

1953 

4.36 

127.9 

123.7 


* These are the weighted aggregative index numbers of 
Table 17.4. 

For sources oJ data, see Table 17.3. 
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Florida- orangeSj and in Column 3 a series of price relatives based upon 
these prices, with 1948 == 100, is shown. In Column 4, however, there 
is a series of indea numbers based on various kinds of citrus fruits, 
which may be referred to collectively as a price index. These index 
numbers may be constructed by combining year by year, in a manner 
which will be described, prices of Florida oranges and prices of other 
citrus fruits. Index numbers can be constructed also by averaging the 
price relatives of each year separately. The first method is usually 
referred to as the aggregative method^ while the second is that of averaging 
price relatives. These procedures will become clearer as they are devel- 
oped more fully. 

PROBLEMS IN THE CONSTRUCTION OF INDEX NUMBERS 

Among the problems which the statistician encounters in index-number 
construction are: 

(1) Definition of the purpose for which the index is being compiled. 

(2) Selection of series for inclusion in index. 

(3) Selection of sources of data. 

(4) Collection of data. 

(5) Selection of base. 

(6) Method of combining data. 

(7) System of weighting. 

Before gathering data and making calculations, it is important to know 
what we are trying to measure, and also how we intend to use our meas- 
ures. An index number properly designed for the purpose in hand is a 
most useful and powerful tool; if not properly compiled and constructed, 
it can be a dangerous one. If we wish to know changes in the cost of con- 
structing private dwellings, we should not gather prices of heavy struc- 
tural steel. Similarly, if we wish to measure the changes in family 
clothing costs, we should not gather prices of cotton by the bale. To 
measure the course of retail trade, we 'should use a sample of department 
store sales, and not data from Jobbers and wholesalers. 

When attempting to measure the well-being of the consumer, by con- 
verting his money income into ^'reaT^ income — that is, by deflating (see 
Table ILl) — ^it obviously would be a mistake to use a wholesale price 
series as a deflator. Also, if we wish to measure the output of goods 
available to the consumer, we would not use an index of industrial pro- 
duction, but rather attempt to compile an index from various consumer 
goods industries. 

Not all of the seven problems listed above are of equal importance, mt 
are they always independent of one another. Thus, a simple system of 
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weighting would require a different, and usually larger, list of commodities 
for a price index than would a method that employs a separate weighting 
system for each subgroup of an index. Likewise, as will be explained 
later, the weighting system to use depends in part upon the method of 
combining the data. It is convenient to include both the method and 
the system of weighting in one formula, and to discuss both points in the 
same section. Likewise, problems 2 and 3, noted above, should be con- 
sidered together. A more complete understanding of these points may 
result if the behavior of price relatives is considered first. 

AN ILLUSTRATION OF THE BEHAVIOR OF PRICE 
RELATIVES 

The United States Bureau of Labor Statistics at the present time 
compiles an index of wholesale prices consisting of approximately 2,000 
separate commodities or series. This index is described in the following 
chapter. The Bureau also publishes wholesale price indexes for a number 
of groups and subgroups and price relatives for individual commodities. 

The indexes for all commodities combined and for the three major 
subgroups are shown in Chart 17.1. To facilitate comparison, the four 
indexes are shown with 1947 = 100 instead of in relation to the published 
base, 1947“1949 = 100. This is accomplished by dividing each index 
by its value for 1947. One of the three subgroups, ^'all commodities 
other than farm products and processed foods, is further analyzed in 
Chart 17.2, which shows the range of the most divergent of the 13 sub- 
groups of this major group. 

Note from Chart 17.1 that the ^‘other commodities^^ index, which con- 
sists mainly of industrial series, fluctuates least, and that the farm prod- 
ucts index fluctuates most. The explanation lies in the fact that, not 
only is agriculture, even today, composed of a large number of small- 
scale farms, and not only is total farm production not very responsive to 
price changes, but also the demand for farm products is inelastic. On 
the other hand, many manufacturing and distributive industries are 
composed of large-scale enterprises the output of which can be more 
responsive to price changes. Furthermore, the demand for finished 
products is more elastic than is the demand for agricultural products. 

To give some idea of the dispersion of price series, Chart 17.2 shows the 
range of fluctuation of the 13 subgroups which make up the ''other com- 
modities ” group. In that chart, the deviations from the group index are 
plotted to show the range, in any one year, for that subgroup which regis- 
tered the largest number of percentage points above the group index and 
for that subgroup falling furthest below the group index. In 1951 and 
1952 the price index for rubber and rubber products rose so far above the 
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other subgroups that it is shown by means of a light broken line; the points 
on the solid curve for 1951 and 1952 represent the next-to4he-highest 
subgroup. 

Of special interest in Chart 17.2 is the fact that the further we go from 
the base period, the greater the tendency for subgroup prices to diverge 
from the group index. However, should the group index turn down and 



Chart 17.1. United States Bureau of Labor Statistics Index Num- 
ber of Wholesale Prices of All Commodities, Farm Products, Proc- 
essed Foods, and Commodities Other Than Farm Products and 
Processed Foods, 1947-1953. The figures have been converted from 
1947-1949 = 100 to 1947 ~ 100 in order that the behavior of the four series 
may be compared more readily. Data from U. S. Department of Commerce, 
Office of Business Economics, Business StatisticSj 1953, p. 27, and Survey of 
Current Business^ February 1954, outside back cover page. 

approach 100, it is quite likely that the subgroup indexes will draw closer 
together again. 

Another point often stated, but not borne out by the data covering the 
limited period here studied, is that when the price trend is upward, the 
distribution of the price relatives of the component series of an index is 
also positively skewed.'^ Many persons are of the opinion that this is an 
inherent characteristic of frequency distributions of price relatives, since 
prices can increase indefinitely, but can decline only to zero.^ On the 


1 This is not literally true, as can be seen from the following examples: (1) United 
States Treasury bills, usually 90-day paper, are ordinarily sold to banks and other 
investors at a discount — that is, they are sold for less than face value and redeemed 
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other hand, it may be suggested that prices and price relatives are 
do ainated more by the laws of economics than by those of mathematics. 
The limits of price advances or price declines are certainly influenced by 



Ciaart 17.2. Maximum Deviations from the United States Bureau of 
Labor Statistics Index Number for “All Commodities Other than Farm 
Products and Processed Foods” Shown by the 13 Sub-groups Comprising 
that Index, 1947-1953. Deviations represent the difference between the most 
divergent sub-groups and the Other Commodities ” index number for each year ; for 
example, in 1947 the figures were 101,4 ~~ 95.3 — +6.1 and 90.9 — 95.3 = —4.4, 
The light broken line follows the deviations shown for 1951 and 1952 by Rubber 
and Rubber Products, which departed markedly from the next highest sub-group 
(shown by solid line) for those years. Data from U. S. Department of Commerce, 
Office of Business Economics, Business Statisticsj 1953, pp. 29-31, and Monthly 
Labor EevieWj March 1954, pp, 365-366. 

the willingness of persons to buy and to sell at different prices. However, 
the direction of price change probably has some effect upon the direction 
of the skewness of the components of an index. 


three months later at face value. The difference measures the yield to the investor, 
or the price to the Treasury. In the period from October 1940 through February 
1941, the Treasury sold 12 series of bills for more than face value, in effect paying a 
negative price. The biE buyers, getting a negative yield, paid a slight premium for 
the privilege of holding the bills. (2) Early in 1945 a metal goods manufacturer in 
New York City was able to sell magnesium shavings and other magnesium scrap to 
dealers. Later in the same year, he was unable to dispose of it by sale and had to 
have it carted away. Thus, the positive price which he had received for the scrap 
early in the year became a negative, or beiow-zero, price later in the year. 
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DATA FOR INDEX NUMBERS 

Although the method of combining the variables is of considerable 
importance in constructing index numbers, it is insignificant when com- 
pared with the problem of selecting the data that are the raw materials 
of the index. Too much emphasis cannot be put upon this point. The 
data must be accurate and homogeneous, and the sample representative. 
A sample cannot be expected to be representative unless an adequate 
number of items is included. To state the idea in other language: a 
sufficiently large sample of relevant items must be selected to obtain 
reliable index numbers. 

As noted before, the commodities to be chosen for a price index, and 
the type of quotation to be selected, depend on what is being measured. 
A wholesale price index requires wholesale prices. An index of prices 
paid by consumers necessitates not only retail prices of food, but rents, 
gas and electric rates, clothing prices, transportation, medical care, and 
so forth, applying to the class of persons for w^hom the cost of living is to 
be ascertained. An index of the changing cost of constructing frame 
houses in Atlanta, Georgia, should include those materials and items of 
labor that are used in frame houses built in Atlanta. The prices should 
be the Atlanta prices of those materials and the wages should be the 
wages in Atlanta of the kind of labor used. These examples indicate 
one reason why it is important to bear in mind at all times the purpose 
for which the index is being compiled. The purpose of the index and just 
what it seeks to measure will also influence the selection of the base, the 
weights used, and the formula employed. 

When selecting the sources of data for index numbers, we may rely on 
regularly published quotations or obtain periodic special reports from the 
merchants, producers, exporters, or others who possess the basic informa- 
tion needed. Under either circumstance, we must make sure that the 
data pertain strictly to the thing being measured. Thus, if retail food 
price changes are being measured, quotations should be from super- 
markets, chain stores, independent stores, and any other important 
outlets. These different sources should not be mixed indiscriminately, 
but should be appropriately weighted when combined. Neither should 
first-of-the-month quotations, middle-of-the-month quotations, and end- 
of-the-month quotations ordinarily be combined in one index. 

The discussion immediately following is in part an application of prin- 
ciples discussed in earlier chapters of this book, especially Chapter 2, 
The great importance of the proper choice of data for index numbers 
justifies a bringing together of these principles, even though some dupF« 
cation is involved. 
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Accuracy. Some statistical data that appear in precise printed form 
cannot be depended upon. If the person or company reporting the data 
uses the data for operational or tax purposes^ they are likely to be accu- 
rate; but if the data are merely statistical reports furnished to an outside 
agency, they may be compiled originally by careless and indifferent clerks 
whose sole interest is in filling the form with ink marks as quickly as 
possible. It therefore behooves the statistician to ascertain how the data 
are collected, and to select his source with discrimination. 

Comparability. Standard grades of the same commodity are, of 
course, comparable between different dates; however, a 1914 automobile 
cannot be compared with a present-day automobile. Nor could the price 
of a standard’^ automobile be computed for different years, since in not 
more than one year could such a standard automobile ordinaril}^ be found. 
In the case of highly manufactured goods, which are further developed 
over the years, the upward bias of price quotations is greatest; but it is 
present, also, in the case even of some agricultural commodities, since 
their production, also, involves more processing in later than in earlier 
years. It is likely, therefore, that most price index numbers have an 
upward bias. 

A similar problem arises when one article passes out of wide use and its 
place is taken b}^ a different commodity serving somewhat the same pur- 
pose. For instance, the stagecoach of 100 years ago has been superseded 
by the streamlined air-conditioned train, the pressurized plane, and the 
de luxe bus. If we should find that the fare from Washington, D. C., to 
Philadelphia were the same in the two periods, we should not conclude 
that the cost of the same service had remained the same, because the 
service, too, has changed. Less time is required to make the trip and it 
is now made in much greater comfort. 

Representativeness. Since index numbers are usually obtained from 
samples, we must try to obtain a sample that behaves like the population 
from which it is drawn. Probabty the most satisfactory way of accom- 
plishing this is to divide the original data into groups and subgroups 
and to draw a representative sample from each of these. Stratification 
into groups and subgroups is employed because the various groups and 
subgroups of commodities, affected by different economic factors, may 
be expected to display patterns of behavior which are distinctive to each 
group and also different from other groups and from the over-all index. 
For example, if an index of wholesale prices is being made, we should 
expect price (or quantity) movements of foods to be different from those 
of building materials. One reason for this is that the demand for food 
products is inelastic, while that for building materials (which are durable 
goods, the purchase of which can be postponed) is elastic. Furthermore, 
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the supply of foods, over short periods of time, is dependent to a con- 
siderable extent on the weather, while the supply of building materials 
is subject to conscious control of the fabricators. 

In choosing the commodities from a group, it is desirable to pick ones 
which tend to conform most closely to the central tendency of the group^ 
if that central tendency can be determined. Haying selected commodi- 
ties that are reasonably representative of the group from which they were 
picked, it is desirable to ascertain whether proportionate representation 
has been obtained for each group. If, upon the basis of dollar value, the 
sample for one group (or groups) constitutes too small or too large a pro- 
portion of the entire group, commodities may be added to or dropped 
from the group sample. When such an adjustment is not feasible (for 
example, if the group were “structural steel” and the sample constituted 
100 per cent of the group), an alternative consists of applying appropriate 
weights, 

A further test of the representativeness of the sample can sometimes 
be employed: Do the value changes of the sample coincide with those of 
the population? This test should be applied not only to the whole 
sample, but to the various groups and subgroups into which it is divided.^ 

Adequacy. In Chapter 24 it will be shown that the reliability of the 
arithmetic mean of a random sample is directly related to the square root 
of the number of items included. Furthermore, in a finite population, 
the larger the proportion of items included in the sample (see Appendix S, 
Section 24.2), the more reliable is the mean of the sample. The absolute 
number of items to use cannot be stated in precise and fixed terms. As 
just noted, commodities (items) are ordinarily selected from the various 
component groups, so that the sample is a stratified one rather than a 
random one. Furthermore, in selecting the items from the groups, the 
more important items are ordinarily chosen first, after which as many 
suitable items are included as resources will permit. Thus, the items are 
not taken at random within each stratum. As a result of these two situ- 
ations, ordinary reliability formulas are not applicable. 

For the index-number illustrations used in the remainder of this 
chapter, five citrus fruits have been selected: grapefruit, lemons, and three 
categories of oranges. For each of these, except grapefruit, the produc- 
tion figures refer to total production. For grapefruit, the production is 
for Florida grapefruit only. The prices for all five fruits are the auction 

^ This test is similar to Irving Fisher^s total value criterion/’ which states that the 
price index multiplied by the quantity index should equal the ratio of change of the 
total value of the population. See Irving Fisher, ‘^The Total Value Criterion,” 
Journal of the Atnarican Statistical Assodatiorij Vol, XXII, December 1927, pp. 
419-441. 
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prices per box on the New York market. The use of these figures involves 
some artificiality, first because the total production was used, including 
not only “production having value, but also fruit consumed on the farm, 
donated to charity, or unharvested or not utilized on account of economic 
conditions, as well as fruit used for juice, concentrates, and so on; second, 
the price quotation is the average per box for the season at just one 
market and does not take account of prices at the other nine auction 
markets in the United States, except as they are reflected in the New 
York market. For these reasons, the various indexes computed in the 
following pages of this chapter must be considered merely as illustrations 
of the behavior of the various formulas and weighting schemes which are 
discussed. 

The season for each fruit begins with the bloom of one year and ends 
with the completion of the harvest the following year. As explained 
below Table 17.2, “1953'' indicates the crop year 1952-1953, and 
similarly for other years. The fruits used for the calculations which 
follow, their seasons, and the weight per box are: 


Fruit Season 

Grapefruit, Florida Sept. 1 to July 31. . 

Lemons, California Nov. 1 to Oct. 31 . . , 

Oranges, Florida Oct. 1 to July 31 . . 

Oranges, California, 

both varieties Oct. 1 to Dec. 31 of 

following year 


Net 

contents 
per box 
80 pounds 
79 pounds 
90 pounds 


77 pounds 


SELECTION OF BASE 

Regardless of the formula employed for weighting and combining the 
data, it is customary (although not necessary) to select some period of 
time as 100 per cent with which to compare the other index numbers. A 
month is ordinarily too short a period to use as base period, since any one 
month is likely to be unusual on account of accidental or seasonal influ- 
ences. A year is sometimes used. However, it is often true that no one 
year is sufficiently “normal" to be a good basis of comparison. Business 
and prices are always advancing or receding with the business cycle. 
Though not so specific, an average of several years is us rally a better 
base. The period 1910 through 1914 has sometimes been used as a price 
base, while the 1923-1925 average has been used for quantity indexes. 
In the past two decades, the statistical agencies of the United States 
Government have successively shifted to several other bases: for example, 
1926, 1935-1939, 1947-1949, and special-purpose ones, such as September 
1, 1939 and June 1950. A useful solution is to employ the period of years 
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that is used by some of the other indexes with which the one being con- 
structed is likely to be employed. 

Although a particular base may be satisfactory for a number of years, 
that base becomes less meaningful as time passes, and it eventually 
becomes desirable to shift to a more recent period. Among the reasons 
are: (1) the dispersion of price relatives may become so great that no 
average is reliable; (2) because of permanent currency depreciation, 
growth of population, technological developments, and other reasons, 
new and higher levels may have been attained by income, prices, produc- 
tion, and consumption; (3) the pattern of consumption may change to 
such an extent that no aggregate of commodities can be found which 
includes the major expenditures common to both periods; (4) the quality 
of many commodities, nominally the same, changes progressively with 
time. An indirect basis of comparison may be had by utilizing a chain 
index system, which involves, essentially, the comparison of each year (or 
sub-period thereof) with the preceding year. This method, which is not 
completely satisfactory, is explained in the following chapter. 

AGGREGATIVE PRICE INDEX NUMBERS 

It has already been stated that there are two methods of constructing 
index numbers: (1) by computing aggregate values; (2) by averaging rela- 
tives. By the first method, as will be explained in this section, the prices 
or quantities are made comparable, are automatically weighted by being 
reduced to dollar values, and then are combined into aggregate values. In 
the following section the method of averaging relatives will be explained. 
There it will be shown that the two methods are, under certain conditions, 
nerely alternative methods of obtaining the same result. The aggrega- 
tive method obtains the result directly, and produces a result that has a 
simple and clear meaning; the method employing relatives is more 
voundabout, and its meaning is more technical. Nevertheless, there are 
situations in which the aggregative method is not applicable, and recourse 
must then be had to the averaging of relatives. 

Sinaple aggregates. Table 17.2 illustrates the construction of a 
simple aggregative price index. The prices of each commodity in any 
given year are merely added together to give the index number for that 
year. It is then frequently convenient to designate some year as a base, 
which is set equal to 100, In this illustration all of the index numbers are 
expressed in the final row as a percentage of the 1948 number, found by 
dividing each one of the numbers fay the value in the base period ($23.01) 
and multiplying by 100. 

It must be apparent that the influence which a commodity exerts on a 
simple aggregative index depends on the price,per unit of quotation. Jn 
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this instance^ the predominant item was lemons; if grapefruit or Florida 
oranges had been quoted at wholesale by the carload instead of by the 
box, they would largely have determined the course of the index. The 
weighting of an aggregative index by one commercial unit of each com- 
modity represented, then, is illogical in that it neglects to consider the 
actual imiyortance of the different commodities; it is haphazard in that 
the relative influence of the different commodities is determined by factors 
quite irrelevant to the purpose of the price index. The problem would 
in no sense be solved if all commodities were reduced to a price per 
pound, for some commodities, such as diamonds, are very costly per 
pound and yet are not very important in our economic life, while coal, 
which is of tremendous importance, is relatively cheap per pound. 
Furthermore, some goods; such as electric power or human labor, cannot 
be reduced to a pound basis. Still another solution is to take as the unit 
of quotation the amount that can be purchased for one dollar in the base 
year. But this is„ scarcely more logical, since it would be very unusual 
if the same amount of money were spent on each commodity in every 
year. 

Before consideration of the construction of weighted aggregative index 
numbers, it may be helpful to state symbolically the method we have just 
used. The formula is 



where P means price index, p refers to the price of an individual com- 
modity, the subscript o refers to the base period, from which price changes 
are measured, and the subscript n refers to the given period which is 
being compared with the base. Now if the formula for a particular year 
(say 1953, with 1948 being the base) is to be stated, it could be written 


P48,53 = 


2p48 


Weighted aggregates. In order to allow each commodity to have a 
reasonable influence on the index, it is advisable to use a deliberately 
weighted rather than a simple aggregate of prices, which, as we have seen, 
involves concealed weighting. To construct a weighted aggregative 
index, a list of definite quantities of specified commodities is taken, and 
calculations are made to determine what this aggregate of goods is worth 
each year at current prices. Obviously the process is merely that of 
multiplying each unit price by the number of units and summing the 
resulting values for each period. The procedure, using the quantities 
produced in 1948 as multipliers, is illustrated in Table 17.3. The reader, 
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having followed the reasoning to this point, Avili realize now that aggrega- 
tive index numbers of price measure the changmg value of a fixed aggregate of 
goods. Since the total cost or value changes while the components of the 
aggregate do not, these changes must be due to price changes. It appears 

TABLE 17.2 

Construction of Simple Aggregative Index Numbers of Citrus Fruit Prices^ 

1948-1933^ 


(Prices are per box.) 


Fruit 

1948 

1949 

1950 

1951 

1952 ; 

1953 

Grapefruit 

S3. 30 

$4.00 

$5.32 

S4.3I 

$4 OF 

$4.40 

Lemons 

6.82 

7.85 

7.70 

7.45 

7.84 

7.61 

Oranges, Florida 

3.41 

4 S8 

5.00 

4 45 

3.81 

4.36 

5/33 

Oranges, California, Navel 

5.16 

6 62 

5.23 

5 . 77 

7.05 

Oranges, California, Valencia 

4,32 

5.31 

5.12 

5.50 

5 58 

5.77 

Aggregate 

$23.01 

S28.16 

$28.37 

;$27 48i 

S28.29 

$27.47 

Index number (per cent of 1948) 

100.0 

122.4 

123.3 

119.4 1 

122.9 i 

119 4 



* The crop year 1947-1948 is designated 1948, and similarly for other years, since most harvesting 
and consequently the marketing occurs in the later year. 

Data from U. S. Department of Agriculture, Agricultural Statistics, 1953, p. 179, and Bureau of 
Agricultural Economics, Crop Reporting Board, October 31, 1953 press release, “Citrus Fruits, Pro- 
duction, Farm Disposition, Value, and Utilization of Sales, Crop Seasons 1951-52 and 1952-53." 


TABLE 17.3 

Construction of Aggregative Index Numbers of Citrus Fruit Prices ^ 1948— 
1933, W^eighted by Production in 1918* 


(Quantities in thousands of boxes; values in thousands of dollars.) 


Fruit 

1 1948 
i pro- 

j A’'alue of 1948 quantity ; 

at price of speeiRed year 

duc- 

tion 

1948 

1949 

1950 

1951 

1952 

1953 

Grapefruit ^ 

33,000 

108,900 

132,000 

175,560 

142,230 

132,3301 

145,200 

Lemons ^ 

12,870 

1 S7,773i 

101,030 

99,099i 

95,882 

100,9011 

97,941 

Oranges, Florida 

58,400 

1199,144 

255,792 

125,118i 

292,000 

259,880 

222,504; 

254,624 

Oranges, Calii'oniia, Navel. .. 

18,900 

' 97,524 

98,847 

109,053 

133,245^ 

100,737 

Oranges, California, A^aleiicia. , 

26,930 

1 116, 338 1 

142,998 

137,882 

148,115 

150,269 

155,386 

Aggregate value. 

Index number (per cent of 1 

... 

|G09,679| 

756,938 

803,388 

755,100 

739,249 

753,888 

1948) i 


; 100.0 I 

124.2 

131.8 

123 . 9 

121.3 1 

123.7 


* See note to Table 17.3 concerning crop years. 

Baaed on price data in Table 17-3 and production data from Agricultural Statistics 1 9J0, p. 193. 


that this type of index number measures the very thing sought if we wish 
to determine changes in the cost of living, that is, the cost of a fixed 
'^market basket of goods and services. The general formula for the 
aggregative price index is 
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The symbols are those used earlier, but a new one has been added: q refers 
to the quantity of the commodity produced, marketed, or consumed (that 
is, the quantity weight, or multiplier). Since the index numbers con- 
structed in Table 17.3 were weighted by base-year quantities, we may 
write the formula more specifically 

p ^ 

2poqo 

Comparing Tables 17.2 and 17.3, it will be seen that, in the simple 
aggregative index, lemons were of greatest importance because they had 
the highest price per box; but, when base-year quantity weights were 
introduced, Florida oranges became most important. 

Selection of weights. Although in the preceding illustration 1948 
quantities were used as weights, this simple procedure is but one of several 
possible systems. It would have been just as easy to have taken, say, 
1953 quantities as weights. If the quantity of each commodity marketed 
changed from year to year in the same proportion, it would make no 
difference to what period the weights referred, for the results would be 
identical. In fact, however, the relative importance of the different 
commodities is constantly changing, and this is due in part to the change 
in the relative prices of the different commodities, which in turn result 
from changes in supply and demand. Therein lies a great source of 
difficulty for which there is no completely satisfactory solution. The 
answer depends in part on what the analyst thinks a price index is sup- 
posed to do. 

One view is that such an index number measures the changing cost of a 
constant aggregate of goods. Another view concerns itself not with the 
goods level of analysis, but with the satisfactions level; an index number, 
according to this view, should measure the changing cost of aggregates of 
goods yielding the same utility or satisfaction at two periods, or two 
places. Thus, suppose we compare the cost of living of two groups of 
similar persons at two periods (or places), these groups having at the two 
periods (or places) the *same tastes and capacity for enjoyment, as well as 
an income that will purchase, and does purchase, the same amount of 
satisfaction.® The commodities, of course, will be different, but if the 
expenditures were $4,000 the first year and $4,800 the second year, we 
may conclude that the cost of living has gone up 20 per cent. It goes 
without saying that no one has accurately made a measurement of this 
kind. Although it seems feasible to measure only the varying value of a 


^ See J, M. Keynes, A Treatise on Mmey^ VoL I, pp. ^6~99. Harconrt, Brace, & 
Co., New York, 1930. 
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fixed aggregate of goods, yet the analyst should select a list of goods that 
will avoid the certainty of bias in a known direction with respect to the 
cost of obtaining equal satisfactions at different times. The following 
suggestions have been made for solving this knotty problem. 

1. Use base-'period quantities as weights. This is the method we have 
used for illustrative purposes in Table 17,3. However, even if there has 
been no change in the tastes or environment of purchasers between the 
two periods, purchases of those commodities that have increased rela- 
tively in price will decline relatively, and purchases of commodities that 
have decreased relatively in price wall increase relatively. It is entirely 
possible that this type of index might record an increase in the price level, 
whereas by increasing the relative amounts purchased of commodities 
that decline in price, the same amount of satisfaction might actually be 
bought by a given individual at a lower total cost. This type of index, 
then, has in a sense an upward bias. It might be said that this index 
marks an upper limit to the price change. This method is sometimes 
known as Laspeyres^ method, and, as previously stated, can be defined 
symbolically, 

P == 

2. Use given-period quantities. That is, use the weights that pertain 
to the year which is to be compared with the base period. This method 
involves the selection of a new set of weights each year, or even more 
often. But frequently it is impossible to obtain current quantity weights, 
and, even if they are available, the labor of computation is approximately 
doubled. Furthermore, although each period is thereby directly com- 
parable with the base year, the comparison of the different years among 
themselves is not valid, for the reason that the aggregate of goods differs 
each year. 

If we think of 1948 as being the base period for an index of consumers^ 
prices, the base-year weighting system answers the question: If it cost 
me $100 a month to live in 1948, how much would it cost me this year to 
live the way I did that year? The given-year weighting system answers 
a different question: If I could have supported my present scale of living 
in 1948 with $100 per month, how much must I spend this year? A 
theoretical objection to asking such a question is that undue w^eight is 
given to the commodities that have declined in price. It is the relative 
decline in price that may be responsible for their increased purchase, and, 
although it is price change which we are trying to measure, yet our 
weighting is partly determined by relative price changes. Thus this 
method may be said to have a downward bias, and marks the low'-er limit 
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of price change. It is sometimes known as Paasche^s method and has the 
following formula: 

p _ 

3. Use the average (or total) quantities of base and given years. This is 
a compromise solution, although it is one which has no general bias in 
any known direction. But again, as in method 2, we have shifting 
weights and a resulting lack of comparability among the different years. 
The method was proposed independently by the English economists 
Marshall and Edgeworth, and the formula 

p ^ + gn) 

^Po(qo + qn) 

is sometimes called the Marshall-Edgeworih formula. 

4. Average together the quantities for all the years which the index numbers 
include. Though perhaps an excellent solution for a historical study, this 
plan is impracticable if the index is to be kept up to date, since it means 
current revision of weights and continuous recomputation of the complete 
set of index numbers. 

5. Average together the quantities of several years which are thought to be 
typical. This again is a compromise solution, but it is practical and is 
very frequently adopted. The list of quantities used will, however, even- 
tually become obsolete. When that is the case, a new index can be con- 
structed and spliced to the old one. Methods for so doing will be con- 
sidered in the following chapter. The construction of an index number 
of 1953 citrus fruit prices, using as weights the average quantities for 1948, 
1949, and 1950, is illustrated in Table 17.4. The index number varies 
only five-tenths from that employing base-year weights. The formula 
for this particular index number may be written 

p — 3^48-50 

Sp48^4S-50 

Of course, the results are the same whether average-quantity or total- 
quantity weights are used. 

6. Determine the highest common factor. The weights are the quantities 
of each commodity common to each year, either to the base and the given 
year, or to all the years under comparison. In the latter case, this wmuld 
mean that, for any commodity, the smallest amount marketed in any of 
'the years under comparison would be taken. Usually, then, the quan- 
tities of the different commodities taken would not each be for the same 
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year. This ingeiiioiis device has been suggested by J. M, Keynes^ to 
avoid the sort of bias inherent in methods 1 and 2, already described. Its 
virtue is its modesty: the device avoids trying that which cannot be done 
perfectly. However, if the values of quantities that are common hO the 
different periods are small compared with total expenditures, or if they 
constitute in different periods a varying proportion of the total, or if the 
satisfaction derived from this aggregate of goods varies, the method is no 
more accurate and, quite likely, is less accurate than method 5. 


TABLE 17.4 

Construction of 1953 Aggregative Index Number of Citrus Fruit Prices, 
W^eighted by Production* in 1948, 1949, and 1950 


(Production in thousands of boxes, values in thousands of dollars.) 


Fruit 

Production 

Total 

pro- 

duc- 

tion 

1948- 

1950 

Aver- 
age 
pro- 
due- 1 
tion 1 
1948- 1 
1950 

Price per 
box 

Value of 1948- 
1950 average 
production at 
price in 

1948 

1949 

1950 

1948 

1953 

1948 

1953 

Grapefruit 

Lemons 

Oranges, Florida 

Oranges, California, Navel 

Oranges, California, Valencia, . . . 

Aggregate value 

Index number (per cent of 1948) 

33,000 

12,870 

58,400 

18,900 

26,930 

30,200 

10,010 

58,300 

111,910 

25,100 

24,200 

11,360 

58,600 

15,630 

26,230 

87,400 

34,240 

|175,200 

46,440 

78,260 

29,130 
11,410 
j 58, 400 

1 15, 480 
26,090 

83 30 
6.82 
3.41 
5.16 
4.32 

$4.40 

7.61 

4.36 

5.33 

5.77 

96,129 
77,816 
199, 144! 
79,877! 
112,709 

128. 172 
86,830 
254,624 
82,608 
150,539 

• • * 







565,6751 
100.0 1 

702,673 

124.2 


* The index number is the same whether the weights used are total or average production for the 
three years. See note to Table 17.2 concerning crop years. 

Data from sources given below Table 17.3 and from AgrictUiural Statistics 1950^ p. 193, and 1951, p, 
178, 


7. Make two index numbers, each with a different set of weights, and 
average the two together, usually geometrically. The two systems of weight- 
ing chosen are ordinarily base- and given-year weights. The formula 
then becomes 


P - 


4 


^Pnq, 


^PnQn 

X — 

^Poqn. 


It is frequently called Fisher’s “ideal” index number, because it conforms 
to certain tests of consistent behavior which Irving Fisher considered 
appropriate.® On the other hand, it is difficult to say precisely just what 
such an index number does measure. 

A general criticism of any weighting system which involves the use of a 
different set of weights for each index number is that, although each index 
number may validly be compared with that of the base year, logically the 


< Ibid., pp. 105-109. 

® See Irving Fisher, The Making of Index Numbers, Houghton Mifflin Company, 
Boston, 1927 , p. 220. In Chapter IV Professor Fishw discusses these tests. 
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index numbers of no other two years (such as 1952 and 1953) can be com- 
pared with each other. This criticism applies to given-year weights, to 
the average of base- and given-year weights, to the highest-common- 
f actor method when the quantities selected are common only to the two 
years being compared, and to the ^'ideaT^ index number. It does not 
apply to base-year weights, average weights of all years, typical weights, 
or the highest-common-factor method when the quantities common to 
all years are used. 

Although the theory’' of weight selection is interesting and involves logi- 
cal analysis of a high order, it is easy to overestimate its practical impor- 
tance. Consider the following results obtained from the citrus fruit data: 


System of weighting 

Simple aggregative 119.4 

1948 quantity weights (base-year weights) 123 . 7 

1948- i 950 average quantity weights. 124.2 

1953 quantity weights (given-year weights) 124.5 

^‘Ideal'’ index number 124.1 


In this case there is a very great difference between the simple and the 
■weighted index numbers, but little difference between the systems of 
weighting. The different weight systems substantially agree because the 
importance of the weights relative to each other was about the same in 
the four systems. If, however, both the prices and quantities had varied 
greatly in their relative magnitude, the different weightings might have 
given markedly different results. If all prices moved in the same direc- 
tion and changed at the same ratio, it would make no difference what 
system of weighting were chosen. But if it so happens that commodities 
which are changing greatly in relative importance during the period are 
also undergoing price changes materiaJly different from the average, then 
the matter of weighting becomes important. It is usually of slight 
importance whether exact weights are used, or only approximate weights. 
Thus, Table 17.5 is exactly like Table 17.4 except that the quantity 
weights are rounded to one digit, but the results vary by only 0.25. The 
explanation is that the rounding did not appreciably change the relative 
importance of the weights. For all practical purposes, sufficiently 
accurate results will usually be obtained if exact weights are given to 
the few more important commodities, and rounded weights to the 
numerous unimportant commodities.® 

' ® Irving Fisber recommends that the quantities be rounded to 1, 10, 100, or 1,000. 
This, of course, materially lightens the work. In rounding any quantity between 1 
and 10 (for instance), the dividing point is not the arithmetic mean of these two num- 
bers, but the geometric mean, 3.1623, since this involves the smallest rehtive error. 
See ibid,, pp. 346 and 432* 
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Although only approximate accuracy is necessarj^ in choosing eights, 
accuracy in price quotations is, in practice, of much greater importance. 
This, of course, results from the fact that some prices are apt to show 
marked changes from year to year, while others change little. This is 
the same as saying that the ratio of the prices to each other changes from 
year to year. 

Over a number of years, various changes take place: commodities shift 
considerably in their relative importance; old commodities disappear from 
use and are succeeded by new commodities; models, styles, or grades of a 
commodity become obsolete and cease to be manufactured, with new 

TABLE 17.5 

Construction of 1953 Aggregative Index Number of Citrus Fruit Prices^ 
Weighted by Average Production* in 1948, 1949, and 1930 Rounded to One 

Digit 


(Production in thousands of boxes; values in thousands of dollars.) 


Fruit 

Average 

produc- 

tion 

1948- 

1950 

rounded 

Price per box 

Value of 1948-1950 
average production 
at price in 

1948 

1953 

1948 

1953 

Grapefruit ! 

30,000 

$3.30 

$4.40 

99,000 i 

132,000 

Lemons 

10,000 i 

6.82 

7.61 

68,200 

76,100 

Oranges, Florida 

60,000 i 

3.41 1 

4 36 

204,600 

261,600 

Oranges, California, Navel 

20,000 

5.16 

5 33 

1 103,200 

106,600 

Oranges, California, Valencia . . 

30,000 

4.32 

5 77 

129,600 

173,100 

Aggregate value i 

1 ... 



604,600 

749,400 

Index number (per cent of 1948) i 




100 0 

123.95 


* See note to Table 17.2 concerning crop years. 
Data from sources given below Table 17.4. 


models, styles, or grades taking their place; marketing centers shift, so 
that a price quotation at the new center must replace that at the old; 
f.o.b. price quotations may give way to delivered prices, or vice versa. 
Under any of these circumstances it may be desirable to express each 
index number, not as a percentage of the original base, but as a percentage 
of the preceding period. Such an index might employ any of the for- 
mulas given above, utilizing weights pertaining to either or both of the 
years or months being compared. Frequently these separate percentages 
are chained back to the original base by a process of successive multiplica- 
tion. Such an index, known as a chain index, will be further described 
in the following chapter. When substituting one commodity for another, 
or when changing w^eights, overlapping data are needed for only a single 
period, as a direct comparison is made only between the prices (or quan- 
tities) of the current period and those of the preceding period. 
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AVERAGES OF PRICE RELATIVES 

Two basic steps are involved in constructing indexes by averaging price 
relatives. 

L Convert the actual prices for each series to percentages of the base 
period. These percentages are called price relatives, since they are 
expressed, not in dollars and cents, but as percentages relative to the 
price in the base period. The upper part of Table 17.6 shows the price 


TABLE 17.6 

Construction of Index Numbers of Citrus Fruit Prices^ 1948-1953"^^ by Use of 
Simple Arithmetic Mean of Price Relatives 


Fruit 

1948 

1949 

1950 

1951 

1952 

1953 

Grapefruit 

100.0 

121.2 

161.2 

130.6 

121.5 

133.3 

Lemons 

100 0 

115,1 

112.9 

109.2 

115.0 

111.6 

Oranges, Florida 

100.0 

128.4 

146.6 

130.5 

111.7 

127.9 

Oranges, California, Navel 

100.0 

128.3 

101.4 

111.8 

136.6 

103.3 

Oranges, California, Valencia .... 

100.0 

122.9 

118.5 

127.3 

129.2 

133.6 

Total 

500.0 

615 9 

640.6 

609.4 

614.0 

609.7 

Average (per cent of 1 948) . . . . 

100 0 

123 2 

128.1 

121.9 

122.8 

121.9 


* See note to Table 17.2 concerning crop years. 
Based on data in Table 17.2. 


TABLE 17.7 


Construction of Index Numbers of Citrus Fruit Prices^ 1948-1953*9 by Use of 
Arithmetic Means of Price Relatives Weighted by Base^Year (1948) Values 
(Values in thousands of dollars.) 


Fruit 

1 

1948 

value 

Price relative of specified year multiplied by 1948 
value 


1948 1 

1949 

1950 1 

1951 

1952 

1953 

Grapefruit 

108,900 

108,900 

131,987 

175,547 

142,223 

132,314 

145,164 

Lemons 

87,773 

87,7731 

101,027 

99,096i 

95,848 

100,939 

97,955 

Oranges, Florida 

199,144“ 

199,144 

255,701 

125,123 

291,945' 

259,883 

222,444 

254,705 

Oranges, California, navel . . 

97,5241 

97,524 

98,889| 

109,032 

133,218 

100,742 

Oranges, California, Valencia. 

116,338i 
^ 1 

116,3381 

142,979 

137,S61| 

148,098 

150,309 

155,428 

Total 

Index number (per cent of 


609,679 

1 

756,817 

803,338 

755,084 

739,224 

753,994 

1948) 


100.0 ! 

124 . 1 

131 8 

123 8 

121,2 

123.7 


* See note to Table 17.2 concerning crop years. 

Based on price relatives in Table 17.6 and 1948 value data in Table 17.3. 

relatives for the five citrus fruits from 1948 through 1953. Each of these 
series of relatives was computed in the same manner as were the relatives 
for Florida oranges in Table 17,1, which are here repeated in the third 
row of figures in Table 17.6. 

2. Average the price relatives for each year separately ^ thus obtaining a 
series of index numbers. In the lower part of Table 17.6 a simple arith- 
metic mean of the relatives has been used. The shortcoming of this 
method is that each relative (irrespective of the importance of the com- 
modity which it represents) influences the index number for a given 
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year according to its percentage of increase or decrease over the base 
period. Chart 17.3 shows the index and the five series of price relatives. 
From this chart it may be seen that in 1950 two relatives increased, while 
three declined, but the index rose because the two relatives 'which 
increased more than offset the three which declined. The two relatives 
which increased might have represented minor components of the index; 
the result would have been the same. It ma}^ be worth while to point 
out that the simple arithmetic mean of price relatives is equivalent to a 
weighted aggregative index, where the weights are the amount of each 


PER CENT 



Chart 17.3. Simple Arithmetic Average Iti<iex Number of Citrus Fruit 
Prices and Price Relatives of Each of the Five Fruits, 1948—1953. 1948 = 100. 

Data from Table 17.6. 


commodity purchasable by $1.00 (or any specified amount) in the base 
year. This is the same as weighting by the reciprocals of base-year 
prices. 

It is, of course, possible to use averages other than the arithmetic 
mean, for example, the geometric mean, the median, or the harmonic 
mean, and some attention will be given to this topic later. More impor- 
tant, however, is the application of weights to the relatives. These 
weights should be value weights, in contrast to the quantity weights used 
with the aggregative method. The reason for this will be apparent 
shortly. Table 17.7 shows the computation of an index of citrus fruit 
prices with the relatives of Table 17.6 weighted by the value of each fruit 
in the base year, 1948. As is apparent from the table, the procedure con- 
sists of: (1) multiplying the relatives by their weights, (2) summing these 
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products year by year, and (3) dividing these totals for each year by the 
sum of the weights. Except for differences due to rounding, the results 
are the same as those obtained for the aggregative index with base-year- 
quantity weights (Table 17.3). That this should be so can be demon- 
strated simply. Let us first take a single commodity, Florida oranges, 
and show that (A) the base-year (1948) value weight applied to the given- 
year (1953) relative produces the same result as (B) the base-year (1948) 
quantity times the given-year (1953) price. That is: 

(A) . . . The price relative for 1953 is $4.36 -f- $3.41 

= 1.2786, or 127.86 per cent; 
the base-year value times the 1953 price 

relative is $199,144,000 X 1.2786 = $254,626,000. 

(B) . . The base-year quantity times the given- 

year price is 58,400,000 X $4.36 == $254,624,000. 

(Table 17.7 shows $254,705,000 for Florida oranges for 1953 because the 
1953 relative was taken as 127.9.) 

This relationship is true, not only for each individual commodity, but 
for groups of commodities^ as well. In symbols: 

^ Vn 

2 PaQo ^ 

Po ^PnQo 

'^PoQo ^Po^o 

^ More generally, the following relationships may be stated with regard to price 
index numbers: 

(1) An arithmetic average of relatives weighted by base-year values (po^o) is the 
equivalent of an aggregative index weighted with base-year quantities. 

(2) Similarly, an arithmetic average of relatives weighted by the product of base- 
year prices and given-year quantities (poqn) is the equivalent of an aggregative index 
weighted with given-year quantities. 

(3) A harmonic average of relatives weighted by given-year values (pnqn) is the 
equivalent of an aggregative index weighted with given-year quantities. Thus, 






(4) Similarly, it may be showm that a harmonic average of relatives weighted by the 
product of base-year quantities and given-year prices (p»^a) is the equivalent of an 
aggregative index weighted with base-year quantities. 

These generalizations may be stated in the form of guides to the construction of 
index numbers, when the index numbers are to be constructed from relatives: 

(a) If it is desired to use the arithmetic average of relatives, the value weights should 



Ch\p. 17] 


INDEX NUMBER CONSTRUCTION 


417 


Evidently the method of weighted average of relatives with base-year- 
value weights is usually a roundabout method of doing what may more 
easily be accomplished by direct means using aggregates with base-year- 
quantity weights. Furthermore, the meaning of an aggregative index 
seems clearer to most persons than does an average of relatives. Why, 
then, should not the aggregative method always be used? One reason is 
that the price relatives themselves are occasionally worth studying, not 
only because an individual series may hold special significance for the 
reader, but because a study of groups of relatives may assist in selecting 
a sample or determining what group indexes to make. In connection 
with frequency distributions, it was observed that an average never gives 
a complete picture of any situation. Other measures may be worth 
making. Another reason is that the series to be combined can sometimes 
be obtained only in the form of relatives, or, they may have meaning only 
as relatives because, as in the case of quantity indexes, a series may con- 
sist of several subseries expressed in different physical units. The use of 
relatives is more common in the construction of quantity indexes (to be 
discussed later) than in the making of price indexes, since the components 
of quantity indexes are themselves often indexes or relatives. 

Commodity weights versus group weights. The same practical 
advice may be offered concerning value weights that was given concerning 
quantity weights — only approximate accuracy is necessary. Neverthe- 
less, the following consideration becomes important when only a limited 
number of commodities is chosen: Should the value weight selected for 
any given commodity be the value of that commodity entering the market, 
or should it refer to the whole group of commodities which the commodity 
represents? The answer to this question is that, unless it is practicable 
to increase the number of items in some groups (and perhaps decrease the 
number in others) sufficiently to obtain proportionate value representa- 
tion for the different groups, it is decidedly better to adjust the weights 
of the different items so as to obtain such group representation. Most 
satisfactory results will be obtained if we select as large a number of 
commodities from each group as feasible, and at the same time give addi- 
tional weight to those elements that are under-represented. 

Another method of accomplishing the same result is to select as many 
commodities as convenient for each group, to ^compute separate group 

be the products of the base prices and whatever quantities are desired. 

(b) If it is desired to use an average of relatives employing value weights that are 
the product of given-year prices and quantities of some period, the harmonic average 
should be used. 

Under no circumstances should the arithmetic average of relatives be used with 
values involving given-year prices, since this gives extra weight to a commodity 
merely because it has gone up in price. Such a procedure results in an upward bias. 
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indexes, and then to combine the group indexes into a general index, using 
the appropriate weights. Since the group indexes are relatives, their 
combination presents no new problem. It might further be noticed that 
weighting of commodities may in a sense be regarded as a substitute for 
selecting the number of commodities from the different groups in propor- 
tion to the value of those groups. 

Types of averages. The geometric mean. Sometimes it is argued 
that the geometric mean should be used for averaging price relatives. 

Let us consider a simple case using only two commodities and involving 
the measurement of price level between two countries. Using Country 
A as the base, we get the following results, showing that, according to 
the arithmetic mean, the price level in Country B is 25 per cent higher 
than in Country A. 


Commodity 

1 Country A 

1 Country B 

Unit 

price 

Price 
relative 
(per cent) 

Unit 

price 

Price 
relative 
(per cent) 

Wheat (bushel) . . . 

$0.80 

100 

11 60 

200 

Cotton (pound) . , . 

.12 

100 

.06 

50 

Arithmetic mean 


100 


125 

Geometric mean 


100 


100 


Now let us see what happens if Country B is taken as the base and 
the price level in Country A is expressed relative to that of Country B, 


Commodity 

Country A 

1 Country B 

Unit 

price 

Price 
relative 
(per cent) 

Unit 

price 

Price 
relative 
(per cent) 

Wheat (bushel) , . . 

$0,80 

50 

$1.60 

100 

Cotton (pound)... 

.12 

200 1 

.06 

100 

Arithmetic mean 

• « • 

125 


100 

Geometric mean 


100 


100 


From these calculations, the arithmetic mean indicates that the price 
level in Country is 25 per cent higher than in Country B, 

The results of the computations in the two tables appear to be incon- 
sistent. However, they are inconsistent, not because of a shortcoming 
of the arithmetic mean, but because of hidden weights which are not the 
same in the two situations. When Country A was the base, it was 
assumed that the amounts of wheat and cotton purchased in Country A 
would be the number of units of wheat (1^ bushels) and the number of 
units of cotton (84* pounds) purchased by $1.00 (or other specified 
amount of money), and that the mme weights would hold for Country B, 
That is, for Country A: 

li bushels of wheat @ 10.80 $1.00; relative - 100; 

pounds of cotton @ .12 - 1.00; relative ~ 100; 
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and for Country B: 

1-| bushels of wheat© $1.60 = S2.00; relative = 200; 

8-ff pounds of cotton @ .06 = .50; relative = 50. 

On this basis, the price level in Country B is 25 per cent higher than in 
Country A. 

When Country B was the base, it was assumed that the amounts of 
wheat and cotton purchased in Country B would be the number of 

units of wheat (f bushels) and the number of units of cotton ( 16 -f 

pounds) purchased by $1.00 (or other specified amount of money), 
and that the same weights would hold for Country A, 

This gives, for Country B: 

f bushels of wheat © $1.60 = $1.00; relative == 100; 

16-1 pounds of cotton © .06 « $1.00; relative =100; 

and for Country A : 

f bushels of wheat © $0.80 = $0.50; relative = 50; 

16| pounds of cotton @ .12 = 2.00; relative = 200. 

Use of this set of weights indicates that the price level in Country A is 
26 per cent higher than in Country B, 

Now, the geometric mean is sometimes advocated because it gives con-* 
sistent results in situations- such as those shown in the two tables above. 
The results are consistent because, with either country as the base, the 
index number for the other country is 100, as may be seen in the tables. 
But the geometric mean yields consistent results only because of the 
assumption inherent in it. This is that the value of the two com- 
modities purchased be in the same ratio in the two countries. This 
means that more wheat would be bought in Country A than in Country 
R, and that more cotton would be bought in Country B than in A, 

In the foregoing paragraphs, no weights had been specified for the 
index numbers which were made. We have already seen that relatives 
should be weighted by properly selected values, and for the illustrations 
just given those weights should be determined upon the basis of the 
actual value of the commodities sold in the two countries. 

Another argument for the geometric mean is based upon the assertion 
that frequency distributions of price relatives tend to form a normal 
distribution when plotted on paper having a logarithmic X scale. Such 
a frequency distribution, but not of price relatives, is shown in Charts 
23.18 and 23.14. The reasoning runs as follows: the doubling of a price 
represents as important a divergence (and is as likely to occur) as a 
decline to one-half of its former level; it is as likely to increase to | of 
the base period as to fall to ■§ of the base period; it is as likely to rise to 
infinity as it is to fall to zero. The resulting frequency distribution 
therefore tends to be normal geometrically, and the geometric mean, 
which coincides with the mode of such a distribution, is the appro- 
priate average. This argument is logical but is based upon premises 
that are not fully established. We are not sure that a price is as likely 
to double as to drop one-half, or as likely to increase 50 per cent as to 
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drop one-third; and, unless balancing of this sort takes place, we do not 
have an appropriate basis for using the geometric mean. 

It should not be thought that the geometric mean must never be used; 
it merely is to be doubted that it has any inherent general superiority 
over the arithmetic mean. It is the belief of the authors that the aver- 
age to use is determined in large part by the use for which the index 
numbers are intended. If, as is very often the case, we wish to compare 
the amount of money required at two different times or in two different 
places to purchase the same commodities (or perhaps the same amount 
of satisfaction by like individuals, with tastes and environment held con- 
stant), the weighted arithmetic mean should be used. This is because, 
as has been shown, such an index number may also be regarded as a 
weighted aggregative index number. On the other hand, if the primary 
object is the study of price relatives, including their average behavior, 
the geometric mean may be useful. 

The modej the median j and the harmonic mean. Use of the mode is 
virtually never advocated, the primary reason being that ordinaril}" no 
clearly defined mode would be present in a group of price relatives. The 
median is seldom used, but it might be appropriate if doubt exists 
concerning the accuracy or representative character of some of the data. 

Of course, the presence of such a doubt may actually mean that the basic 
data were not properly gathered. Use of the harmonic mean has been 
suggested by Ferger (see footnote 2 in Chapter 18) if it is desired to use 
the reciprocal of a price index as an index of the purchasing power of 
money. 

Comparison of the four types of price indexes. Before beginning 
the consideration of quantity indexes, it may be well to pause a moment 

PER CENT 



Chart 17.4, Index Numbers of Citrus Fruit Prices, as Obtained by Iliffereitt 
Methods, m8-l95r. Data from Tables ,17.2, 17.3, 17.6, and 17.7. 
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and to compare the results of the four types of price indexes which have 
been discussed. Chart 17.4 shows these four indexes, but it has three 
curves rather than four, because two of the indexes coincide. As we 
already know, the two that are alike are the aggregative with base-year- 
quantity weights and the arithmetic average of relatives weighted by 
base-year values. Note the general agreement of all three curves, 
although there are some important differences in magnitude (for exam- 
ple, in 1950) and in direction (for example, in 1952). The simple aggre- 
gative and the simple arithmetic average of' relatives, both of which 
have logical shortcomings, both failed to go high enough in four years 
and in two instances moved in the wrong direction. 

QUANTITY INDEX NUMBERS 

Aggregative type. An aggregative index number of quantity 
(physical volume) is the counterpart of the corresponding price index. 
Thus, the construction of a simple aggregative quantity index would 
involve the formula 



and Table 17.8 shows the computation of such a quantity index for citrus 
fruits. Ordinarily, an index computed in this w^ay is obviously illogical, 
since it involves adding quantities expressed in different units, such as 
tons, thousands of board feet, kilowatt hours, and so on. For the citrus 
fruit, it would have been possible to express all production in terms of 
pounds, but even this would not yield a satisfactory index, since the rela- 
tive importance of each fruit in the economy would have been ignored. 

Using base-year prices as weights, the formula becomes 

o = 

^qopo 

The construction of this weighted aggregative quantity index, with 
1948 = 100, is shown in Table 17.9. 

Just as the aggregative index number of price measures the changing 
value of a fixed aggregate of goods at varying prices, so the aggregative 
index number of physical volume measures the changing value of a 
varying aggregate of goods at fixed prices. The price index answers 
the question: If we buy the same assortment of goods each year, but 
at different prices^ how much will we spend each year? The physical 
volume index answers the question: If we buy varying quantities of 
specified goods each year, but at the same price, how much will we spend 
each year? While in the former case the difference in amount spent was 
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due to price change, in the latter case the difference must, of course, be 
attributed to changes in quantities bought and sold, since prices were 
held constant. Thus an index, computed by use of the formula last 
given, tells us the comparative quantities (produced, sold, consumed, and 
so forth) for each of the periods covered. 


TABLE 17,8 

Construction of Simple Aggregative Index Numbers of Citrus Fruit Pro- 

ductionf 1948-195$* 


{Quantities in thousands of boxes.) 


Fruit 

1948 

1949 

1950 

1951 

1952 

1953 

Grapefruit 

Lemons 

Oranges, Florida 

Oranges, California, Navel 

Oranges, California, Valenica .... 

Aggregate 

Index number (per cent of 1948) 

33,000 

12,870 

58,400 

18,900 

26,930 

30,200 

10,010 

58,300 

11,910 

25,100 

24,200 

11,360 

58,500 

15,630 

26,230 

33,200 

13,450i 

67,300 

14,610 

30,600 

36,000 

12,800 

78.600 

12.600 
25,810 

32,600 

11,900 

72,200 

16,630 

28,700 

150,100 

100.0 

135,520 

90.3 

135,920 

90.6 

159,160 

106.0 

165,810 

110.6 

161,930 

107.9 


* See note to Table 17.2 concerning crop years. 
Data from sources given below Table 17.2. 


TABLE 17,9 

Construction of Aggregative Index Numbers of Citrus Fruit Productioni 
1948-1953, W^eighted by Prices in 1948* 


(Values in thousands of dollars.) 


Fruit 

1948 

price 

Value of amount produced in specified year at 
1948 price 


per box 

1948 

1949 j 

1950 

1951 

1952 

1953 

Grapefruit 

$3.30 

108,900 

99,660 

79,860 

109,560 

118,800 

107,250 

Lemons 

6.82 

87,773 

68,268 

77,475 

91,729 

87,296 

81,158 

Oranges, Florida. 

3.41 

199,144 

198,803 

199,485 

229,493 

268,026 

246,202 

Oranges, California, Navel. . 

5 16 

97,524 

61,4561 

80,651 

75,388 

65,016 

85,811 

Oranges, California, Valencia. 

; 4.32 

[116,338 

108,432 

113,314 

132,192 

111,499 

123,984 

Aggregate value. 


009,679 

536,019; 

550,785 

638,362 

1650,637 

644,405 

Index number (per cent of 






i ' 

1948) 


100.0 

88.0 

90.3 

104.7 

106.7 

106.7 


* See note to Table 17,2 concerning crop years. 

Based on quantity data of Table 17.8 and 1948 price data in Table 17.2, 


Various methods of weighting are available for the construction of 
quantity index numbers, and in general the same considerations apply 
that were discussed in connection with price index numbers. In obtain- 
ing price weights which are averages of twm or more years, the average 
prices should be weighted-average prices, obtained by dividing the total 
value sold in these years by the total number of units in those same years. 
Thus, if average quantities of base and given years are used, we have the 
rather formidable-looking formula 
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2 ^ 

Q _ \ go + gn / 

^ t PoQo + P«gn\ 

V g„ + g„ 7 

Likewise, if the common-factor method is used, the price weight should 
be derived from the largest value that is common to all the years in 
question. 


TABLE 17JO 


Construction of Index Numbers of Citrus Fruit Production, I94S-1953*, 
by Use of Simple Arithmetic Mean of Quantity Relatives 


Fruit 

00 

1949 

1950 

1951 

1952 

1963 

Grapefruit 

100. 0 


73.3 


109.1 

98.5 

Lemons 

100.0 


88.3 

104.5 


92.5 

Oranges, Florida 

100.0 



115,2 

134.6 

123.6 

Oranges, California, Navel 

100.0 

63.0 


77.3 

66.7 

88.0 

Oranges, California, Valencia. . . . 

100.0 

93.2 

MWM 

113.6 

95.8 


Total 

.^00.0 

426.3 

441.9 

611.2 


509.2 

Average (per cent of 1948) 

mm 

85.1 

88.4 

102.2 

101.1 

101.8 


* See note to Table 17.2 concerning crop years. 
Based on data in Table 17.8. 


TABLE 17.11 

Construction of Index Numbers o;f Citrus Fruit Production, 194S-1953*, 
by Use of Arithmetic Means of Quantity Relatives Weighted by Base- 

Year (1948) Values 


(Values in thousands of dollars.) 


Fruit 

1948 i 

Quantity relative of specified year multiplied by 
1948 value 




1 1948 1 

I 1949 1 

1 1950 ! 

1951 I 

1 1952 i 

1 19S3 

Grapefruit 

108, 

900: 

108, 

,900 

99, 

644 

79, 

,824 

109,553 

U8, 

,810 

107 

,266 

Lemons. 

87, 

773i 

87, 

.773 

68, 

287 

77, 

504 

91,723| 

87, 

.334 

81 

,190 

Oranges, Florida 

199, 

144i 

199, 

,144 

198, 

746 

199, 

542 

229,414 

268, 

,048 

246 

,142 

Oranges, California, Navel. . . 

97, 

62 I 

97, 

,624 

61, 

,440 

80, 

652 

75,386 

65, 

049 

85 

,821 

Oranges, California, Valencia. 

116, 

338 

116, 

,338 

108, 

,427 

113, 

313 

132,160 

111, 

,452 

124 

,016 

Total. 


, 

009, 

679 

536, 

544 

550, 

835 

638,236 

650, 

693 

644 

,435 

Index number (per cent of 








1 

! 





1948) . 

.. 


1 100.0 1 

i 88.0 i 

i 90.3 1 

1 104.7 

1 106.7 1 

! 105.7 


* See note to Table 17.2 concerning crop years. 

Based on Quantltjr relatives in Table 17.10 and 1948 value data in Table 17,9. 


Averages of relatives. TMs method of constructing quantity index 
numbers is strictly analogous to the method applied to the measuring of 
price changes. The procedure is illustrated by Tables 17.10 and 17.11. 
As was tound to be true with price index numbers, the use of base-year- 
yalue weights produces the same result as the aggregative method employ- 
ing base-year-quantity weights. 
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Because of ease of computation and simplicity of meaning, the aggrega- 

tive method is to be preferred to the average-of-relatives method when- 
ever it is applicable. As noted before, there are circumstances when the 
aggregative method cannot be used. Not previously mentioned is the 
situation that obtains when the relatives which are to be averaged are 
percentages, not of a fixed base but of a changing normal Here, of 
course the average-of-relatives method is necessary. In other words, the 
aggregative method cannot be used if an index of business cycles is to be 
constructed, since the data to be averaged are percentages of trend and 
seasonal. 

Usually the weights selected for an average of quantity relatives are in 
proportion to the values in exchange of the different series. Occasionally, 
some consideration is given also to the relative amplitude of the different 
series, if they are cyclical relatives. If an index is constructed, not for the 
purpose of measuring changes but for the purpose of forecasting changes, 
the basis of selecting will be, not the economic importance of the different 
series represented, but their importance for purposes of forecasting. 

Chapter 18 will describe methods of constructing a number of impor- 
tant indexes and will discuss certain points of technique and theory not 
covered in this chapter. 



Symbols Used in Chapter 18 


p: price of a commodity. 

P: price index number. 
q: quantity of a commodity. 

Q: quantity index number. 

n: a subscript indicating a given period or the current period. 
o: a subscript indicating the base period. 

S: upper-case Greek sigma, meaning ^Hake the sum of.’’ 
u: units of purchasing power per dollar. 

Numerical subscripts to p and g, when written 53 or 47-49, for example, 
indicate that the price or quantity referred to is for the year specified 
or is the average (or total) for the years separated by the hyphen. 


m 



CHAPTER 18 


Index Number Theory and Practice 


The object of this chapter is twofold. First, the theory of index num- 
bers and certain refinements of technique will be further discussed. Sec- 
ond, a description of a number of indexes will be given. The indexes were 
selected partly on account of their wide usefulness, and partly on account 
of the interesting technique which they employ. In general it will be 
found that in actual practice the procedures outlined in Chapter 17 will 
not be followed exactly, but that in each case there will be circumstances 
which justify special modifications of method. 

INDEX NUMBER CONCEPTS 

Mathematical tests. One school of thought on index numbers 
believes that there may be such a thing as a perfect index number formula, 
and that such a formula can be recognized by its ability to meet certain 
mathematical tests of consistency. Whether or not those tests are 
logically valid is an open question. Not only can an index be considered 
“ideal” if it meets those tests, according to this theory, but other indexes 
that do not meet them can be graded according to how closely they 
approximate them in actual practice. 

The tests are derived by the logic of analogy. Anything that is true of 
an individual commodity should also be true of a group of commodities 
considered as a whole. If a box of oranges was worth 125 per cent as 
much in 1953 as it W'as in 1948, then the 1948 price was 80 per cent of the 
1953 price. Reasoning by analogy, if an inde.x number for 1953 was 125 
with respect to a 1948 base, then the index number for 1948 should be 
80 with respect to a 1953 base. In other words, an index number should 
work backward as well as forward. 

.\gain, suppose that a commodity increases from 40 cents to 60 cents 
and that the sales increase from 2 units to 4 units. The price is 150 per 
cent of the base year, the quantity sales are 200 per cent, while the value 

The first draft of this chapter was prepared by Dr. James P. Paris. 
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is 1.50 X 2.00 = 3.00 times the base year, or 300 per cent of the base year. 


This is verified by noting that 


0.60 X 4 
0.40 X 2 


Once more reasoning from 


analogy, it may be argued that a price index times a quantity index com- 
puted from the same data should equal the relative value of the transac- 
tions in the given year with respect to the base year. In other words, if 


^ X — = — > 

Vo Qo PoQo 

then it should be true that 


PXQ 




As indicated in the preceding paragraph, there are two tests which are 
considered especially important by the “mathematical test” school. 
These may be called: (1) the time reversal test; (2) the /actor reversal test. 

The time reversal test may be stated more precisely as follows: If the 
time subscripts of a price (or quantity) index number formula be inter- 
changed, the resulting price (or quantity) formula should be the reciprocal 
of the original formula. If we take the formula 




and interchange the time subscripts, the resulting formula is 


But 


"^PoQn 

Sp„g„ 


hence the test is not met. On the other hand, the formula 


becomes 



2pngn 

2p<,g„ 



V 


the product of the two expressions is unity, and Irving Fisher’s “ideal” 
index meets the time reversal test. 

The factor reversal test may be stated in this .way: If the p and q factors 
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in a price (or quantity) index formula be interchanged, so that a quantity 
(or price) index formula is obtained, the product of the two indexes should 


give the true value ratio 



^P<^0 

Again taking the formula 

2p„?<, 


Spogo 

we transform it into 

Sg.po 


2g„po 


This is a quantity index, but since 


^Pn<la ^ ^QnPo 
^qoPo 


the test is not met. However, we find that 


transforms into 



X 


'^Poqn 



'^qnPn 

'^qaPn 


The product of these t^vo “ideaP^ indexes is 

^Poqo 

and the test is met. 

Fisher^s ‘Tdeal” index number is so called because it is one of an 
extremely limited number of indexes that meet both of these tests. 

Relationship of formula to use. The concept of an ‘TdeaP' index 
is attacked by index number students belonging to a different school of 
thought on the ground that the analyst cannot say exactly what the 
*‘ideaP^ index measures; he can only assert vaguely that it measures a. 
change in the price level, or use some similar expression. To Willford I. 
King,^ the logical procedure is to ask a specific question, and then to 
devise a formula which will answer that specific question. For instance, 


^ See Willford I. King, Index Numbers Elucidated, Longmans, Green and Company, 
New York, 19S0, espeeiaily Chapter III. The reader may also wish to refer to B, JD. 
Mudgett, Index Numhere^ John^ Wiley & Sons, Inc,, New York. 1951, Chapter 4. 
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the formula 




applied to retail prices compares the cost in the present 


year with the cost in the base year of supporting the physical scale of 
living which obtained in the base year. While this is a specific question, 
it may not be the most useful question to ask. Just what is an appro- 
priate question to ask is an important problem facing the person conduct- 
ing the investigation. In Chapter 17 Keynes was interpreted as believing 
it appropriate that, for measuring changes in the value of money, one 
should first seek an index number that would measure the changing cost 
of aggregates of goods yielding the same utility to similar groups of per- 


sons at two periods. Now the formula ■ 


assumes that, if their tastes 


do not change, people will continue to buy the same amounts of goods no 
matter how" great the price rise or fall, while actually there is a shift from 
those items which are becoming more expensive to those which are 
becoming cheaper.' This formula, then, wmuld have an upward ^^bias,^^ 
since the cost of obtaining the same quantity of goods would be higher 
than the cost of obtaining the same quantity of utility. The formula 

on the other hand, compares the cost of supporting one’s present 

physical scale of living with its cost in the base year. This formula, from 
the same point of view, has a downward “bias,” since no sensible person 
would have bought the same goods in the base year as he does now (even 
granting the same tastes and environment), because the relative prices of 
goods would have been different. The cost of obtaining the present 
year’s bill of goods in the base year would have been greater than the cost 
of obtaining the current year’s economic satisfactions. 

Fisher’s “ideal ” index formula is the geometric mean of two index num- 
bers biased (or inappropriate) in opposite directions; and many persons 
hold that the average of two wrong answers does not necessarily give one 
right answer, even though the two errors are in opposite directions and 
even though the formula is internally consistent. On the other hand, it 
is doubtful that Keynes’ common-factor method will in actual practice 
answer Keynes’ question any better than (if as well as) the “ideal” index 
number. Changes in relative prices with consequent changes in relative 
quantities purchased may reduce the value of the common factor to a 
small proportion of the total goods bought. Nevertheless, it is still 
another attempt to arrive at a logical decision as to exactly what one is 
trying to measure. 

For purposes of measuring changes in the value of money (purchasing 
power of the dollar), it is customary to use the reciprocal of a price 
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index. Ferger, however, argues that this is illogical.^ Just as a price 
index a\’'erages together price changes of specific commodities, so a pur- 
chasing power index should average together changes in the purchasing 
power of the dollar for specific commodities. If the price of corn is $.50 
per bushel, the purchasing power of the dollar for corn is 2 bushels. 
Designating units of purchasing power per dollar by the symbol u, Ferger 
suggests this purchasing power index number formula: 

Purchasing power == rr 

^PoQo 

But since w ~ we may write 
P 

Purchasing power = — 

^VoQo 


This expression is the reciprocal of the harmonic mean of price relatives 
weighted by base-year values, since the latter is 



So FergePs formula is still in effect (though not in concept) the reciprocal 
of a price index, though not the usual index based on the arithmetic mean. 
Presumably it would be possible to alter somewhat the weighting system 
without doing violence to his concept. 

If we accept the idea that the purpose of an index number determines 
its formula, we need not, necessarily, abandon the ^'ideaP' formula. It 
would be possible to maintain that, although the formula is not a perfect 
solution to every index number problem, nevertheless there are purposes 
for which it is especially suited, as for instance the analysis of value 
changes into constituent price changes and quantity changes. However, 
it seemingly would have to be abandoned as a theoretically sound index 
if we take the position that every index number must answer a specific 
question couched in layman's English. 

* See WIrth F. Ferger, Distinctive Concepts of Price and Purchasing Power Index 
Numbers/* Journal of the dmerimn Statistical Association. YoL XXXIL June 1936, 
pp. 258-272. 
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THE CHAIN INDEX 

In its simplest form, the chain index is one in which the figures for each 
year (or subperiod thereof) are first expressed as percentages of the pre- 
ceding year. These percentages are then chained together by successive 
multiplication to form a chain index. Table 18.1 shows the computation 
of a weighted aggregative chain index of citrus fruit prices. As noted 
above the table, the prices are weighted by production in the first year of 
each pair of years. These products are summed for each year and each 

TABLE 18.1 


Construction of Weighted Aggregative Chain Index of Citrus Fruit PriceSf"^ 
(For each pair of years, the weights are the productions in the first year. Values in thousands of dollars.) 



Price X production in first year of each pair of 
years 

Sum of 

Per 
cent 
of pre- 

Chain 

index 

Year 

Grape- 

fruit 

Lemons 

Oranges, 

Florida 

Oranges, 

Cali- 

fornia, 

navel 

Oranges, 

Cali- 

fornia, 

Valencia 

prod- 

ucts 

ceding 
year 
of each 
pair 

1948 

1949 

108,900 

132,000 

87,773: 

101,030 

199,144 
255,792 i 

97,524 

126,118 

116,338 

142,998 

609,679 

756,938 

100.0 

124.2 

100.0 

124.2 

1949 

1950 

120,800 

160,664 

78,678 

77,077 

255,354 

291,600 

78,844 

62,289 

133,281 

128,512 

666,857 

720,042 

100.0 

108.0 

134.1 

1950 

1951 

128,744 

104,302 

87,472 

84,632 

292,500 

260,325 

81,745 

90,185 

134,298 

144,265 

724,759 

683,709 

100.0 

1 94.3 

I 126]6 

1951 

1952 

143,092 

133,132 

100,202 
: 105,448 

299,485 

256,413 

84,300 

1 103,000 

168,300 

170,748 

795,379 

768,741 

100.0 

96.7 

122*3 

1962 

1953 

144,360 

168,400 

100,352 

97,408 

299,466 

342,696 

88,830 

67,158 

144,020 

148,924 

777,028 

814,586 

100.0 

104.8 

128^2 


♦ See note to Table 17.2 concerning crop years. 

Based on price data in Table 17.2 and prcduction data in Table 17.8. 


sum is expressed as a percentage of the sum for the preceding year, as 
shown in the next-to-the-last column of the table. The results of the 
“chaining” procedure are shown in the last column of the table. They 
are obtained as follows; (1) the 1949 percentage, 124.2, is the 1949 chain 
index number; (2) since the 1950 percentage figure is 8.0 per cent greater 
than 1949, the 1950 chain index number is 1.242 X 1.080 = 1.341, or 
134.1 per cent; (3) the 1951 percentage figure is 0.943 of the 1950 figure, 
so the chain index number for 1951 is 1.341 X 0.943 = 1.265, or 126.5 
per cent; and so on for the other years. 

The advantages of a chain index are: (1) commodities may readily be 
dropped, if they are no longer relevant; (2) new commodities may be 
introduced; and (3) weights may be changed. Thus, account may 
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readily be taken of basic changes in production, distribution, and con- 
sumption habits, of quality changes, of any hiatus in some of the data, 
and of other similar changes that cannot readily be handled in a fixed- 
base index number. The principle of the chain index is employed in 
several instances later in this chapter. 

The disadvantage of the chain index is that, while the percentage-of- 
previous-year figures give accurate comparisons of year-to-year changes, 
the long-range comparisons of the chained percentages are not strictly 
valid. However, when the index-number user wishes to make year-to- 
year comparisons, as is so often done by the business man, the percen- 
tages of the preceding year provide a flexible and useful tool. 

SUBSTITUTING NEW COMMODITIES AND CHANGING 

WEIGHTS 

Sometimes it is necessary or desirable to drop a commodity from an 
index, to add a new commodity, to substitute one commodity for another, 
or to change the weight of a commodity. Substituting one commodity 
for another will ordinarily involve also a change of weight. These 
adjustments involve an application of the chain index. As an illustration 
of substitution, we shall construct an index of the producers' price of 
grapefruit for the years 1948 (the base year), 1951, 1952, and 1953. 

A fairly satisfactory index of the producers' price of grapefruit can be 
made, through 1951, using Florida seedless grapefruit, other Florida 
grapefruit, and Texas grapefruit. However, in 1952 (that is, the 1951- 
1952 season) because of a freeze, the Texas grapefruit crop amounted to 
only about 200,000 boxes and the price soared to $3.89 per box. Again, 
in 1953 the Texas crop was only 400,000 boxes and the price was $2.34 per 
box. For the purposes of our illustration, we shall substitute Arizona 
grapefruit for Texas grapefruit in 1951. 

Table 18.2 shows the computation of a weighted aggregative index for 
1948 and 1951 using base-year-quantity weights, and it may be seen 
that the 1951 index number is 225.34 for the ^‘old series" using Florida 
and Texas grapefruit. The substitution of Arizona for Texas grapefruit 
is made, in 1951, by multiplying the Arizona grapefruit price by the 
Texas weight, giving the product shown in the table: 15.776 million 
dollars. The total of the products for the 1951 ^^new series" is 53.250 
million dollars, and this total is set equal to the already determined 1951 
index number, 225.34. The 1952 and 1953 products W Arizona grape- 
fruit are determined as was the figure for 1951, and sums of products are 
gotten for 1952 and 1953. The index numbers for 1952 and 1953 are 
then obtained by these relationships: 
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For 1952— 


For 1953— 


53.250 

225.34 

Index number 
for 1952 


45.140 

index number’ 
for 1952 

191.02. 


53.250 _ 51.566 

225.34 index number’ 
for 1953 


Index number 
for 1953 


218.21. 


The procedure used in Table 18.2 may seem to underweight Arizona 
grapefruit because its unit price in 1951 was less than that of Texas 
grapefruit.® As a matter of fact, the weight given to Arizona grapefruit 
(23.2 million boxes) was too great in 1951, when only 7.5 million boxes of 
Texas and 3.15 million boxes of Arizona grapefruit were produced, and 
even more exaggerated in 1952, when only 0.2 million boxes of Texas and 
2.14 million boxes of Arizona grapefruit were produced. Notice the 
extent to which the 1952 index number reflects the increase in price of the 
over-weighted Arizona grapefruit. Obviously, the weights should 
have been revised when Arizona grapefruit was substituted for Texas 
grapefruit.** 

Such a revision of weights is made in Table 18.3. Here the 1951 index 
number for the '^old series” is again 225.34. The ^‘new series” of 
weighted aggregates for 1951 uses 1951 quantity weights, and the sum of 
the products for the ^^new series” for 1951 is 40.098 million dollars, which 
is set equal to an index number of 225.34. The index numbers for 1952 
and 1953 are then obtained, as before, from the relationship: 


^ When it is reasonable to continue to use the base-year weight for a substitute 
commodity, an adjustment lor the different unit prices of the old and new commodity 
may be made by computing: 

Old Unit Price 

The procedure is then similar to that given in Table 18.3. See the first edition of this 
text, pp. 623-626. 

^ Actually, there should have been an earlier downward revision of the weight of 
Texas grapefruit, since production was only 6.4 million boxes in 1950. The procedure 
Would be similar to that in Table 18.3; 



Chap. 18 ] INDEX NUMBER THEORY AND PRACTICE 


435 


eo 

eo 


m 




is 

TS 

^ e 

W esj 

C ^ 
0 

*M 

*w 

«s 

•w 

xD CO 

s 5* 
CO S| 

S .i' 


i s 

.s i 

h, ^ 

4 

S 

•»* ^ 
^ as 

a,.S> 

P 


s 

^ p 


s 

e 


« 

3^ ■ 

tel. » 

CuO t 

'W CLr 

I 

fi) ^ 

« Vs 

xSg ^ 
•j* e 

i « 

•as 

§< 


6 


1 1953 

Prod- 
uct, 
new 
series 
(mil- 
lions of 
dollars) 

p63^61 

1 20.382 
13.746 

: 2.457 

36.585 

206.60 

Price 

(dol- 

lars 

per 

box) 

1 P63 

05 05 .00 

w ; i> 

rH 


to 

o 

r-i 

: Prod- 
duct, 
new 
series 
(mil- 
lions of 
dollars) 
7162551 

1 16.116 
^ 10.092 

2.646 

28.854 

162.16 

Price 

(dol- 

lars 

per 

box) 

p62 

1 02 
.58 

.84 

: : : 

rH 

iO 

05 

f-4 

Prod- 
duct, 
new 
series 
(mil- 
lions of 
dollars) 
Psigsi 

20.382 

17.574 

2.142 

00 't' 

05 . CO 

0 

10 

Q (N 

(N 

Price 

(dol- 

lars 

per 

box) 

psi 

05 i-« .00 

(NO : 0 

rH rH * 

: : 

1951 

quan- 

tity 

weights 

(mil- 
lions of 
boxes) 

qu 

10 

00 xfi ■ i-t 

to 1> -CO 

1—4 tH 

: : : 

\ 1961 1 

Product 
(mil- 
lions of 
dollars) 

P6iqm 

19.092 

18.382 

22.968 

<N 'cH 

’ lO 

0 <N 

CD <N 

Price 

(dol- 

lars 

per 

box) 

Psi 

CSJ rH 05 

<N 0 05 : 

rH rH 

: : : 

00 

tH 

05 

rH 

Product 
(mil- 
lions of 
dollars) 

^48^48 

(N <N 00 

05 «0 CO 

05 CO . 

l> r-H 
tH 

26.823 

100.00 

0 I 

M )•— t 60 00 

•S 0 (15 55 

^ ,-4 05 

lO rrH ; 

d ’ ' ' 


1948 

quan- 

tity 

weights 
(mil- 
lions of 
boxes) 

^48 ! 

14.8 

18.2 

23.2 


Grapefruit 

Florida, seedless. .... 

Florida, other. ...... 

Texas .............. 

Arkona. 

Total ............. 

Index number, old 

series 

Index number, new 
series ........... 


♦ See note to Table 17.2 concerning crop yeara. Prices are those at intake packing-house door. 
Data from sources given below Table 18.2. 
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For 1952— 


For 1953— 


40.098 _ 28.854 

225.34 index number’ 


Index number 
for 1952 

40.098 

225.34 


for 1952 
162.15. 

36.585 


index number ’ 
for 1953 


Index number 
for 1953 


205.60. 


Dropping a commodity without adding a new one or adding a new 
commodity which is not a substitute for an old one would, of course, 
involve a change of weights. The procedure would be similar to that of 
Table 18.3. Changing weights without adding or dropping a commodity 
could also be handled in similar fashion. 


DESCRIPTIONS OF INDEXES 

The remainder of this chapter will be devoted to brief descriptions of a 
number of indexes designed to measure price changes, changes in physical 
volume, general and specific business movements, and other changes and 
differences. No index is explained in full detail, and the reader should 
bear in mind that a two- or three-page description of an index can do 
little more than mention some of the more important features of that 
index. Some idea of the amount of compression involved may be had 
when it is realised that the oflnicial description of the Federal Reserve 
Index of Industrial Production covers 41 pages, exclusive of the accom- 
panying detailed tables. 


PRICE INDEXES 

The Consumer Price Index. This index,® compiled by the United 
States Bureau of Labor Statistics on a 1947-1949 base, is entitled Index 
of Changes in Prices of Goods and Services Purchased by City Wage- 
Earner and Clerical- Worker Families to Maintain Their Level of Living.” 
It is generally referred to as *^The Consumer Price Index” and, as its 
name indicates, is a statistical measure of retail price change. It is not, 


® Tbis description is based on U. S. Bureau of Labor Statistics, The Consumer Ftiee 
index: A Short Bescriptim of the Index m Revised^ and The Consumer' Priee Index: 

A Lapfmn^s Guide. 
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strictly speaking, a cost-of-living index, since it does not measure changes 
in kinds and aAiounts of goods and services which people buy or the total 
amount they spend for living. Neither does it measure differences in. 
living costs between different places. 

The retail prices which make np the index are divided into eight major 
groups: food, housing, apparel, transportation, medical care, personal 
care, reading and recreation, and other goods and services. Food and 
housing are further divided into subgroups. The approximately 300 
commodities and services which are included were selected as being 
representative of the price trends of subgroups of related items and 
include the cost of such diverse commodities and services as rice, pork 
chops, canned salmon, potatoes, men^s topcoats, men^s work gloves, 
women’s wool suits, rent, mortgage interest, electricity, sheets, table- 
cloths, automobiles, gasoline, office visits to (and home visits by) physi- 
cians, eyeglasses, and haircuts. The 300 commodities are representative 
of the ** market basket” of goods and services comprising the pattern of 
living of city workers’ families (of 2 or more persons) in 1952 and were 
selected as the result of an “expenditures survey” of 8,000 families of 
wage earners and clerical workers in 97 cities. 

Price data of the 300 commodities and services are collected from 46 
cities, selected so as to be representative of those city characteristics 
which affect the way in which families spend their money. Thus, such 
factors as size, climate, population density, and income level are taken 
into consideration. Within each city, price quotations are gotten from 
those sources from which families of wage and salarj^ workers obtain 
goods and services. For items purchased from stores, for example, 
quotations are obtained from representative chain stores, independent 
stores, department stores, and specialty stores. For each item, the prices 
reported by the various sources are averaged, with appropriate weights, 
to ascertain average price changes for the city. 

Index numbers are prepared monthly for the United States and for 
each of five large cities, and quarterly for fifteen additional cities. Index 
numbers for five of the fifteen cities are given each month. Price changes 
within each city are averaged and combined by a procedure which is 
essentially a weighted aggregative, the weights being the proportionate 
expenditure in the “market basket” for the subgroup (which each 
item represents) in the siirve}^ of 8,000 families referred to above. For 
example, the weight assigned to the price of white bread is the propor- 
tionate expenditure in the “market basket” for all bread and plain rolls. 
When the price changes for the various cities are combined to obtain 
figures for the United States, each city is given a weight “proportionate 
to the wage-earner and clerical-worker population it represents in the 
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index/^ City weights are adjusted as new Census population figures 
become available. As noted in the preceding chapter, this index is 
frequently used as the basis of reference for escalator clauses in wage 
agreements. 

United States Bureau of Labor Statistics Index of Wboiesale 
Commodity Prices. This index/ on a 1947-1949 base, is kept up to 
date on an annual, monthly, weekly, and, for spot prices, daily basis. 
It measures the general rate and direction of the composite price move- 
ments in primary markets and the specific rates and directions of price 
movements for individual commodities and groups of commodities. 
The majority of the quotations used in the index are producers^ prices 
rather than wholesalers’ prices. The index is designed to measure price 
changes between periods of time, not changes occasioned by shifts in 
quality, quantity, terms of sale, and so on. 

The index includes approximately 2,000 commodities ranging from raw 
material to finished products, which are ‘^intended to account, directly 
or indirectly, for all sales of all products (including both imports and 
exports) at the primary market level in the United States in 1947.” 
The ” primary market level” refers to the first important commercial 
transaction for each commodity. 

The commodities included in the index are classified into 15 major 
groups and 88 subgroups. Each subgroup is further divided into product 
classes, which are groups of commodities produced by one or more 
related industries, and which are also characterized by similarity of price 
movement, raw materials, or production process.” The major groups 
are: 

1. Farm Products 

2. Processed Foods 

3. Textile Products and Apparel 

4. Hides, Skins, and Leather Products 

5. Fuel, Power, and Lighting Materials 

6. Chemicals and Allied Products 

7. Rubber and Products 

8. Lumber and Wood Products 

9. Pulp, Paper, and Allied Products 

10. Metals and Metal Products 

11. Machinery and Motive Products 

12. Furniture and Other Household Durables 


^ For a detailed description, see TJ. S. Bureau of Labor Statistics, Monthly Labor 
February 1952, Description of the Revised Wholesale Price Index/^ 
by Edgar I. Eaton. 
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13. Nonmetallic Minerals — Structural 

14. Tobacco Manufactures and Bottled Beverages 

15. Miscellaneous 

Groups 3 through 16 are also combined into a still larger category, ^^all 
commodities other than farm products and processed foods — that is to 
say, industrial products. As a result, the three divisions, (1) farm, 
products, (2) processed foods, and (3) all commodities other than farm 
products and processed foods, are available. 

The 2,000 commodities do not constitute a random sample. They are 
generally the most important ones in each field or, in some cases, though 
not important in terms of sales volume, ^'appear to offer good representa- 
tion of price movements because of certain industry or trade character- 
istics.^^ The selection of commodities was based upon ^'knowledge of 
each industry and its important products” and ordinarily was preceded 
by ^^consultation with leading trade associations and manufacturers in 
each field.” 

The index is basically a weighted aggregative with 1947 quantities 
being ordinarily, but not always, used as the weights. To cope with the 
necessity of making allowances for changing specifications of individual 
commodities, the Bureau computes commodity indexes ^'by chaining 
together the month-to-month price relatives and weighting these by the 
value of sales, rather than the absolute prices w’eighted by physical 
quantities.” This procedure also facilitates substitution of one com- 
modity for another and alteration of the system of weights. 

Index numbers are published monthly for ‘^all commodities,” for the 
15 major groups and the 88 subgroups, for the three divisions, for product 
classes, and for individual series. Weekly index numbers are released 
for ^^all commodities,” for the three divisions, and for certain subgroups 
based upon about 200 commodities included in the monthly index and 
upon estimates of the prices of the other commodities. Each weekly 
index number is an estimate of what the monthly index would be if it 
were computed each week.” A daily index of spot market prices is also 
prepared. 

Indexes of Prices Paid by and Received by Farmers; Parity 
Ratio, To facilitate attempts to raise agricultural prices to an official 

parity” with industrial prices, the Agricultural Marketing Service 
computes two indexes^ with the legally specified base 1910-1914. One is 

^ Based largely upon “Prices Received and Paid by Farmers/^ pp. 10-13 of 
torical and Descriptive Supplement to Economic Indicators, issued by tbe Joint Com- 
mittee on the Economic Report. A more detailed description is in Bureau of Agri- 
cultural Economics, Agricultural Economics Research, April 1950, pp. 33-'62, “The 
Revised Price Indexes,” by B, E. Stauber, N. M. Koffsky, and C. K, Randall. 
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called the Index of Prices Paid by Farmers and is termed the Parity 
Index when interest on farm mortgage debt, taxes on farm real estate, 
and cash wages paid to hired hands are included. The other index is 
referred to as the Index of Prices Received by Farmers. The ratio 
of the index of prices received to the Parity Index, for any given period, 
is the ^^parity ratio. 

The index of prices paid by farmers includes 344 price series. Index 
numbers are published monthly for 15 subgroups. Six of these subgroups 
are combined to form an index of expenditures for family living; nine of 
them are brought together to make an index of expenditures for pro- 
ducing farm products. These two group indexes are merged to form the 
Index of Prices Paid by Farmers. When combining the prices bf indi- 
vidual commodities, quantity weights are used. In most cases, the 
weights were derived from a survey of expenditures, by dividing the 
expenditure for each commodity by the average price of that commodity 
in 1937-1941. When subgroup and group component indexes are com- 
bined, they are weighted by the amounts spent by farmers during the 
years 1937-1941. The index does not measure price changes alone, since 
it is affected by changes in quality of the commodities commonly stocked 
by merchants and changes in quality of goods bought by farmers as they 
adjust to higher or lower income levels. 

The index of prices received is based upon 50 commodities which make 
up about 95 per cent of total cash receipts from marketings of all farm 
commodities, including both crop and livestock but not timber and forest 
products and certain other minor categories. The prices used to make 
the index are those for ^'all grades and qualities of the important agricul- 
tural commodities at the point of first sale, which generally is the local 
market.^^ The index is essentially a weighted aggregative. United 
States average prices for individual commodities are combined into sub- 
group indexes, the weights being the quantities sold by farmers during 
1937-1941. When subgroup indexes are combined to form group and 
all-commodity indexes, the w^eights are ‘'the percentage 'which cash 
receipts from marketing for the particular commodity subgroups bear to 
the total for the same period — 1937-1941.^^ Index numbers appear 
monthly for all farm commodities, all crops, livestock, and livestock 
products, and for 13 subgroups. Like the index of prices paid, this index 
does not measure price changes alone, since it involves average prices for 
all grades and qualities of the various commodities. 

The "parity ratio mentioned at the outset undertakes to measure the 
extent to which prices received by farmers are higher or lovrer in relation 
to the prices they pay than they were in the base period, 1910-1914. 
This parity ratio was first provided for in the Agricultural Adjustment 
Act of. 1933, which undertook to "re-establish prices to the farmer at a 
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level that will give agricultural commodities a purchasing power with 
respect to the articles that farmers buy equivalent to the purchasing 
power of agricultural commodities of the base period/^ which was set at 
1910-1914. 

Common stock prices. The Securities and Exchange Commission 
stock price index® is based upon the weekly closing prices of 265 of the 
more active common stocks listed on the New York Stock Exchange. 
(There are about 1,000 stocks listed.) These 265 stocks were selected as 
follows: First, all companies with common stock listed on the New York 
Stock Exchange were classified according to the standard industrial 
classification set up by the Bureau of the Budget. Second, each industry 
group which accounted for more than one per cent of the volume or value 
of common stock trading on the Exchange in 1949 was picked for repre- 
sentation in the index. There were 29 such groups. Third, from each 
of these industry groups the most active stocks were selected. Enough 
were included to comprise at least 65 per cent of the volunae or value of 
trading in each group in 1949. 

When combining the weekly closing prices of the various stocks, each 
stock price is weighted by the number of outstanding shares of that 
stock. Each resulting product is the current market value of the par- 
ticular stock. The current-market-value figures are then added together 
for each of the 29 industry groups and for all groups combined and 
expressed as percentages of 1939 to 3 deld the indexes. The base figure 
for 1939 is the average of the 52 weekly products for that year. Index 
numbers are issued weekly, monthly (based on 4 or 5 weeks), and annually 
(based on 52 weeks) . 

Several characteristics of this index should be borne in mind when 
using it: (1) Each industry group is weighted according to the value of 
the selected issue or issues taken from it, and not necessarily according 
to the value of all listed stocks in the particular group. (2) The index 
will not necessarily reflect price movements of stocks not listed on the 
New York Stock Exchange. (3) The index will not necessarily indicate 
the behavior of the prices of the less active listed stocks. 

One problem faced by the Securities and Exchange Commission was 
that of adding a stock, dropping a stock, or substituting one stock for 
another. The necessary adjustment is made in the base value rather 
than by the use of chain indexes. If new shares are to be added, they are 
given a theoretical base-year (1939) value which is added to the base 
value of the industry group. Similarly, if a stock is dropped, a theoretical 

® This description is based upon information in Computation of the 8. B. (7. Index, 
an undated mimeographed release of the Securities and Exchange Commission, and 
on p. 13 of Bistorical and Descriptive Supplement to Economic Indicators, issued by the 
Joint Committee on the Economic Report. 
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base-year value for that stock is subtracted from the base-year value for 
the group. Adjustments for changes in capitalization of a company, the 
stock of which is in the index, are handled in similar fashion. The follow- 
ing example® shows how one stock is substituted for another: 

On February 26, 1954, Minnesota Mining and Manufacturing Co. 
was substituted for Celotex Corp. in the Stone, Clay and Glass Product 
Index, On that date, the value of the outstanding shares of Celotex 
Corp. was $16 million, and the value of Minnesota Mining and Manu- 
facturing Co. was $465 million. The current value of all shares in the 
Stone, Clay and Glass Product Index was $1,686 million before any 
substitution was made. The base value of the industry group was 
$936 million so that the index for February 26 \Yas $1,686 million divided 
by $936 million which equals 180.1%. The theoretical value in 1939 
of the Celotex shares was $16 million divided by 180.1%, or $9 million, 
and the value of the Minnesota Mining and Manufacturing Co. shares 
was $465 million divided by 180.1 %, or $258 million. The adjusted base 
value is then 

$936,000,000 - $9,000,000 + $258,000,000 = $1,185,000,000. 

This same result can be arrived at more simply by the use of the 
formula set forth in our pamphlet ^‘Computation of the SEC Index:” 

■XT r 1 /■XI j 1 . 1 New current value 

New base value = Old base value X ; — 

Old current value 

When Minnesota Mining and Manufacturing Co. is substituted for 
Celotex, the new current value of the industry group becomes $2,135 
million.^® From the formula. 

New base value = $936,000,000 X = $1,185,000,000. 

Thus, the new current value, $2,135 million, divided by the new base 
value, $1,185 million, gives the same index of ISO.l. 

INDEXES OF PHYSICAL VOLUME AND OF BUSINESS 

ACTIVITY 

Federal Reserve Index of Industrial Production. This index/^ 
which is issued monthly by the Board of Governors of the Federal Pi^eserve 
System, uses i947-'i949 as the base period and measiu^es changes in the 

® Furnished by the Securities and Exchange Commission. 

This hgure is obtained as follows: 


Old current value $1,086 miliion 

Less, Celotex Corp 16 million 

$1,670 million 

Plus, Mmnesota Mining and Manufacturing Co 465 million 

New current value ..... $2,135 million 


Based on discussions in Joint Committee on the Economic Report, Eutorical and 
Eescriptwe Supplement to Economic Indicators, pp. 24-27, and in Federal ReBewe 
Bulkiin, December 1953. 
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physical volume of output in manufacturing and mining. Individual 
series are combined to form indexes for products and industries and for 
groups of industries in conformity with the Standard Industrial Classifi- 
cation developed by the Bureau of the Budget. The over-all index, 
Industrial Production, is divided into Manufactures and Minerals. 
These two are divided into subgroups with Manufactures having two 
major subgroups: Durable Manufactures and Nondurable Manufactures. 

The industries covered by the Index of Industrial Production account 
for about one-third of the national income. Among the important areas 
of the economy which are not covered are: construction, public utilities, 
transportation, trade, services, and agriculture. 

The index is intended to measure physical output, but a great many 
industries do not, or cannot, provide physical output data. As a result, 
the Board must sometimes use related series which tend to fluctuate more 
or less closely with output. Among these are man-hours, shipments, and 
materials consumed. In some instances, the monthly series can be cor- 
rected after annual data of physical output become available. The basic 
figures are expressed in terms of output per working day. 

The Board actually calculates two indexes, a monthly one and an 
annual one. Formerly, the annual index numbers were merely averages 
of the monthly ones, but as a result of a recent revision, they are now 
developed independently and are based upon more detailed information 
than are the monthly index numbers. The monthly index of manufac- 
tures includes 164 product or industry series; the annual index, about 
1,370 series. The monthly index of minerals is based upon 11 series; the 
annual index is composed of about 70 series. 

The method of combining the data for the individual series involves: 
(1) converting each sbries into percentages of the average monthly output 
in the base period, 1947”1949; (2) multiplying each series of relatives by 
a base-year weight factor expressed as a percentage of the weight assigned 
to all the series; and (3) adding the products resulting from step (2). 
The weights used are based on value added in 1947, value added being the 
difference between the value of products and the cost of materials or 
supplies consumed. In some instances, data of value added are not 
available but must be estimated. An adjustment is made for the fact 
that the data of value added refer to 1947 while the production relatives 
are in relation to 1947-1949. This consists of dividing each value-added 
figure for 1947 by the corresponding quantity relative for 1947. These 
quotients are then expressed as percentages of the total of such quotients 
for all manufacturing and mining and become the weights. 

Index numbers for the 26 major' industiy groups and for the larger 
categories (durable and nondurable manufactures, manufactures, min- 
erals, and industrial production) are given on a seasonally adjusted basis 
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as well as unadjusted. Seasonal adjustments are made for the 26 group 
indexes rather than for the individual series making up the indexes. 
Seasonally adjusted figures for the larger categories are obtained as the 
result of combining seasonally adjusted group indexes. 

The American Telephone and Telegraph Company Index of 
Industrial Activity. The American Telephone and Telegraph Com- 
pany presents its index^^ of industrial activity in two forms: first, as per- 
centages of 1939 with adjustment for seasonal but not for trend, and, 
second, as percentage deviations from trend. 

The components of the index consist of 25 series divided into 7 cate- 
gories. Each component is adjusted for seasonal variation and is 
weighted “approximately in accordance with its representativeness as a 
measure of industrial activity.’’ Modifications are made in the iiidex to 
conform with the Census of Manufactures in the years when the censuses 
are taken, if such adjustments are warranted. The seven major groups 


forming the index and their weights are: 

Cornponeni Basic series used Weight 

Metals 5 series on production, consumption, and ship- 
ments 0,30 

Textiles 4 series on consumption and deliveries . 16 

Man-hours in four manufac- 4 series from chemicals and allied products; 
turing industries stone, clay, and glass products; petroleum 

and coal products; rubber products . 15 

Industrial power and man- 3 series on electric power and man-hours . 15 

hours 

Food 5 series on livestock slaughter, grindings of 

grain, and malt liquor production . 10 

Paper and printing 3 series on production and consumption . 10 

L\imber 1 series on production .06 

Total...... EOO 


The trend of the index was obtained by fitting an exponential curve 
(see pages 290-294) to annual figures of industrial activity per capita for 
the period 1860-1914 and a second such curve to the data for 1901-1950. 
Trend values through June 1907 were obtained from the first curve, 
while trend values for July 1907 and thereafter were gotten from the 
second curve. The trend values for the index of industrial activit}^ are 
the result of multiplying these per capita trend values by the p.opulation 
expressed as percentages with 1939 = 100. 

The New York Times Weekly Index of Business Activity* The 
New York Times compiles a weekly index^® of business activity composed 


^®This description m taken from information in Index cf Industrial Actwity^ issued 
by the Comptroller’s Department of the American Telephone and Telegraph Com» 
pany, June 1961. - 

Based on information supplied by the Hew York Times. 
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of six series, each of which is expressed as a percentage of the estimated 
normal for that series. Since the base periods for the components vary, 
the combined index is in terms of ‘‘an estimated normal,” and no specific 
year, or average of several years, is set equal to 100. Each component 
series is adjusted for seasonal variation, with the exception of steel ingot 
production, which is expressed as percentage of capacity. This com- 
ponent, as well as the electric power series, is also adjusted for trend. 

The estimated normals for the six series and their weights are shown 
below. The effective weight for each series represents “its relative 
importance and reliability as a business indicator.” The adjusted 
weights are gotten from the effective weights by dividing the effective 
weights by the amplitude of each index during 1934-1939. This adjust- 
ment is intended to “give the assigned weights their true value undis- 
torted by the varying cyclical amplitudes of the various components.” 


Series 

Estimated 

normal 

Estimated normal value 

Effective 

Adjusted 

period 

weight 

weight 

Miscellaneous 

1928-1937 

48,000 cars per day 

22 

0.21 

carioadings 
Other car- 

1928-1937 

72,000 cars per day 

12 

.13 

loadings 

Steel ingot 

1919-1939 

69 per cent of capacity 

25 

.10 

production 





Electric power 

Straight-line trend with weekly increment of 186,- 

17 

.38 

production 

000 kilowatt hours per day 



Paperboard 

1933-1939 

20,240 tons per day 

12 

.10 

production 

Lumber 

1933-1939 

35,660 thousand board feet per day 

12 

.06 

production 







Total 


o7^ 


The weights total 0.98 instead of 1.00 because two series were dropped 
during World War II and paperboard production was subsequently 
added. Therefore, after each series has been multiplied by its adjusted 
weight, the sum must be multiplied by 1.024. 

Indexes of Business Activity in the Pittsburgh District* Various 
organiaations compile and issue local indexes of business conditions. 
This index, prepared by the Bureau of Business Research of the Uni- 
versity of Pittsburgh, refers to activity in the four counties surrounding 
and including Pittsburgh, but one component, bituminous coal produc- 
tion, is for the entire Western Pennsylvania producing area. The index 
is issued weekly and monthly. Since the weekly and monthly indexes 

Derived from University of Pittsburgh, Bureaii of Business Research, Description 
of the Monthly Indexes of Business Activity in the Pittsburgh District^ January i952; 
Description of the Weekly Indexes of Business Activity in the Pittsburgh District^ January 
1052; and by correspondence. 
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are not quite identical as to components, the following description will 
refer to the monthly index. 

In addition to the general monthly index, indexes are also issued for 
production, iron and steel production, trade, shipments, and bituminous 
coal production. AlPindexes are adjusted for seasonal variation and are 
computed both with and without adjustment for trend. The base 
period is 1935-1939. Sixteen series are used to make the indexes. These 
and their weights are : 


Industrial production 51 

Rate of steel operations 15 

Pig iron production 8 

Industrial electric power consumption 10 

Man-hours in manufacturing 5 

Slaughterhouse activity 2 

Combined manufacturing 2 

Bituminous coal production 9 

Originating shipments 22 

Carioadings of coal 5 

Carloadings of coke 2 

Carioadings of iron and steel products .... 6 

Carioadings of ail other commodities .... 7 

River traffic 2 

Trade 27 

Retail 15 

Wholesale 5 

New motor car registrations 2 

Produce receipts 2 

Bank debits 3 

Total 100 


The weights assigned to the series are in proportion to value added in 
1928. “Value added” is value of products, or gross receipts, less cost of 
materials, supplies, fuel, purchased energy, and contract work. The 
16 individual series are each expressed as relatives (1935-1939 — 100) 
before the weights are applied. Carloading figures in the series included 
under “originating shipments” are adjusted to reflect tonnage. Retail 
and wholesale sales and bank debits are adjusted for price changes. 

INDEXES OF QUALITATIVE CHANGES OR DIFFERENCES 

Farm housing indexes for Oklahoma. An unusual application 
of index-number technique has been made by Robert T. McMillan^^ in 
Ms indexes which undertake to compare the counties of Oklahoma accord- 
ing to certain criteria at a specific time. McMillan^s indexes do not, 

Comparisons of Farm Housing Indexes for Oklahoma,” by Robert T. McMillan, 
Btfdal Forc 0 Bj December 1945, pp. 174-180. McMillan was primarily interested in 
comparing the four methods mentioned here. He decided that they produced equally 
satisfactory indexes of housing. 
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therefore, involve changes from one period to the next, but rather geo- 
graphical differences. 

Sixteen housing measures for each of the 77 counties of Oklahoma are 
used to construct four different indexes of rural-farm housing. Each 
index yields an index number for each county. Two of the indexes are 
of special interest. In one of these, the counties are merely ranked in 
regard to each of the 16 measures; then the ranks are added and divided 
by 16. In the other one, each county receives a relative for each of the 
16 series, the relative being based on the ratio between (I) the county 
value in a series to (2) the corresponding figure for the state. For 
example, one ratio is (1) the percentage of homes in the county with 
running water divided by (2) the percentage of homes in the state with 
running vrater. The 16 relatives for each county are then averaged. 
Of the other two indexes, one employs standard scores (see pages 224-225) 
while the other uses factor analysis^® and but five of the 16 measures. 

The 16 measures used for the indexes were the percentages of homes 
with; running water, flush toilet, bathtub or shower, electric lighting, 
fewer than persons per room, radio, mechanical refrigeration, value of 
$1,000 or more (for owner-occupied homes), 7 rooms or more, major 
repairs not needed, gas or electricity for cooking, central heating, and 
monthly rental of $10 or over; also the percentage of owner-occupied 
homes and the percentage of farms having one or more automobiles and 
the percentage with telephone. 

Non-chronological index numbers, which undertake to measure geo- 
graphical differences or differences between categories, are not often 
encountered, and relatively few are in current use. Attempts have been 
made to use index numbers to measure the adequacy of state care of 
mental patients/^ with comparisons being undertaken both between 
states and between two different years; to compare the religious work of 
dioceses;^® to rate the agricultural value of soils;^^ and to compare state 
school systems with each other. 

At the beginning of Chapter 17 it was noted that index numbers may 

Factor analysis is beyond the scope of this text. See J. P, Guilford, Psychometric 
Methods^ McGraw-Hill Book Company, Inc., New York, 1954, Second Edition, 
Chapter 16, 

See “Indices of Adequacy of State Care of Mental Patients/^ by Ellen Winston, 
American Sociological Revim, April 1938, pp. 190-202. 

Consult J, Elliott Ross, An Index Number fo? American Dioceses ^ The Shield 
Press, Racine, Wisconsin, and the National Catholic Welfare Conference, Washington, 
D. C., undated pamphlet. 

See R. Earl Storie, An Index for Bating the Agricultural Yalm of Soih^ Bulletin 
556 of the California Agricultural Experiment Station, Berkeley, California. 

Brief descriptions of two such indexes are given in the first edition of this text, 
pp. 645-618. 
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be employed to make chronological, geographical, or categorical com- 
parisons in various areas of human activity. Since the vast majority of 
indexes have to do with price variations and many others deal with 
fluctuations of production, the illustrations in the preceding paragraphs 
were mentioned in order to call attention to some of the diverse fields in 
which index-number technique has been used. The reader may readily 
sense its applicability to other sociological or educational data, to psy- 
chology, to medicine, and to other fields far removed from the monetary 
and physical-volume concepts of economics and business. 

SOURCES OF CURRENT INDEX NUMBERS 

A large proportion of the most useful economic indexes that are pub- 
lished currently can be found in the following periodicals : 

(1) Survey of Current Business j published monthly, with biennial 
supplements, by the United States Department of Commerce, Office of 
Business Economics. 

(2) Federal Reserve Bulletin^ issued monthly by the Board of Governors 
of the Federal Reserve System. 

(3) Economic Indicators, released monthly as a Congressional Docu- 
ment. Prepared by the Council of Economic Advisers for the Joint 
Committee on the Economic Report. 

Other sources of statistical data, in some of which index numbers 
appear, will be found in Appendix U. 



Symbols Used in Chapter 19 


a: the value of Yc when X == 0 in the equation F<. = a +■ hX, 

a^: the value of Xc when F = 0 in the equation Xc - a' + 

ail number of observed frequencies in the upper left cell of a 2 X 2 table. 

ag: number of observed frequencies in the lower left cell of a 2 X 2 table. 

b : the slope of the estimating equation Yc = a + bX. 

//: the slope of the estimating equation == a' + 6 'F. 
bii number of observed frequencies in the upper right cell of a 2 X 2 table. 
62 : number of observed frequencies in the lower right cell of a 2 X 2 table. 
C : coefficient of mean square contingency. 

deviation of a cell, in terms of classes, from X'c?. 
dy- deviation of a cell, in terms of classes, from F^. 

D: difference between the ranks of paired values. 

/: a frequency; in grouped correlation, a frequency in a cell. 
fx: a frequency of the X series; in grouped correlation, a column fre- 
quency. 

fy* a frequency of the F series; in grouped correlation, a row frequency. 
k: coefficient of alienation. 

coefficient of non-determination. 

N: the number of items in a sample. In two- variable correlation, N is 
the number of pairs of items. 
r: coefficient of correlation, 
r-: coefficient of determination. 

Vtiiuic: coefficient of rank correlation. 
sx: standard deviation of the X series. 
syi standard deviation of the F series. 

• 9 r.x: standard error of estimate for the estimating equation Fc = a + hX. 
S: upper-case Greek sigma, meaning ^Hake the sum of.’’ 

Si/-: total variation of the F values. 

Ityl: variation of F e.vplained by use of the estimating equation F« ~ 

a + bX, 

S|/;: variation of F unexplained by use of the estimating equation Yc =* 
a + bX, 
x:X - X, 

X: the X series; also, an observed value in the X series. Thus, we refer 
to correlating X and F, but SX means the values in the X 

series.” 
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X axis: the horizontal axis. 

Xci a computed X value. 

X : the arithmetic mean of the X series. 

chi-square. The symbol is a lower-case Greek chi. 

7 — F. is the total variation in the Y series. 

Vc* Yc — F. Xyl is the explained variation in the F series. 
l/«: F — Fc. Sy® is the unexplained variation in the F series. 

F : the F series, also an observed value in the F series. Thus, we refer to 
correlating X and F, but SF means '^sum the values in the F series.^^ 
F axis: the vertical axis. 

Yci a computed F value. 

F : the arithmetic mean of the F series. 

Fci the arithmetic mean of the Yc values; Yc = F. 



CHAPTER 19 


Correlation I: Two- variable Linear 
Correlation 


One of the chief objectives of science is to estimate values of one factor 
by reference to the values of an associated factor. ^'The scientific method 
. . . consists in the careful and laborious classification of facts, in the 
comparison of their relationship and sequences, and finally in the discov- 
ery by the aid of disciplined imagination of a brief statement or formula^ 
which in a few words resumes a wide range of facts. Such a formula . . . 
is termed a scientific law.^'^ When the relationship is of a quantitative 
nature, the appropriate statistical tool for discovering and measuring the 
relationship and expressing it in a brief formula is known as correlation. 

A SIMPLE EXPLANATION 

It may surprise some of us to know that there is a very close relation- 
ship between temperature and the frequency with which crickets chirp. 
If, for instance, we should count the number of chirps made by a cricket 
in 15 seconds and add it to 37, we could closely approximate the Fahren- 
heit temperature at that time. Or, if we should multiply the degrees 
Fahrenheit by 3.78 and subtract 137 from the result, we could estimate 
the number of chirps to be expected from a cricket in one minute. This 
relationship would be found remarkably accurate, unless the temperature 
was below 45®. When the weather is colder than 45®, crickets do not 
chirp. Likewise, it might not be accurate appreciably beyond 75"^, since 
observations have not been made beyond that temperature, and we do not 
know, therefore, if the relationship holds for higher temperatures. 

The relationship between these two variables — temperature and cricket 
chirps — is displayed in Chart 19,1, known as a scatter diagram. Each dot 
represents an observation of one cricket. Thus, observation A represents 
a cricket which, at a temperature of 59.0®, chirped 85 times per minute. 

^ Karl Pearson, The Grammar of Science^ p. 77, Adam and Charles Black, London, 
1900. 
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The reader should notice that temperature is plotted along the X-axiS; 
while chirps per minute are plotted along the F-axis. This is done 
because the number of chirps per minute appears to be a direct result of 
the temperature. In this case it is also true that we wish to estimate the 
number of chirps to he expected at a given temperature; temperature is 
therefore the independent variable, and chirps per minute the dependent 


CHIRPS 
PER MINUTE 



4 0 4 5 50 55 60 65 70 75 80 


TEMPERATURE; DEGREES FAHRENHEIT 

Chart 19.1. Temperature and Chirps per Minute of IIS 
Crickets. Data provided by Mr. Bert. E. Holmes, 

variable. Even though it were temperature we wished to estimate^ it 
would nevertheless be best to show the causal factor on the X-axis. 
When the causal relationship is not clear, or when neither factor can be 
said to be the cause of the other, then the variable to be estimated should 
be plotted on the F-axis. 

Judging from Chart 19.1, we see that the relationship between the two 
variables is linear, for the straight line appears to be as good a fit as a 
more complicated curve. The equation of this line^ is 

Yc « -- 137.22 + 3.777X. 

® This equation was fitted by the authors to data furnished by Bert E. Holmes. See 
also Bert E. Holmes, “Yoeal Thermometers/’ TheScieMifie VoL XXV, Sep- 

tember '1927, pp. 261-264. 
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From this equation, estimates of chirps can be made for any desired tem- 
perature within the limits of the observations shown on the chart. ThuSj 
if we wish to estimate the number of chirps when the temperature is 59.0° 
(observation A), we find the number by substituting 59.0 for X in the 
equations. Thus 

Yc = --137.22 + (3.777)(59.0) - 86 chirps. 

The estimate could be read, although less accurately, directly from the 
estimating line plotted on the chart. Although the estimate (86) does not 



Chart. 19.2. A Scatter Diagram 
Illustrating Perfect Linear Corr€~ 
lation. The correlation would also 
be perfect if the line on which the dots 
lie had a negative, instead of a positive, 
slope. From F. E. Croxton, Elementary 
BtathticB with Applications in Medicine^ 
Prentice-Hali, Inc,, New York, 1953, 

p 112. 


Chart 19.3. A Scatter Biagram 
Illustrating No Correlation. Var- 
ious other arrangements of dots are 
possible which will also show no cor- 
relation. From same source as Chart 
19.2. 


agree perfectly with the actual observation of 85 chirps, the discrepancy 
is not large. 

We cannot fail to be impressed with the adequacy of the generalization 
expressed in the equation = -137.22 + 3.777X Since most of the 
dots are very close to the line, it appears that frequency of chirps has 
been adequately explained by reference to temperature. The slight 
variations from the estimating line are unexplained and may be due to 
differences between individual crickets, differences associated with the 
time of day or year in which the observations were made, humidity, and 
inaccuracies of observation of temperature or number of chirps. Also, 
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the temperature at the spot where the cricket is chirping may.be different 
from that where the observer is standing. This might be the case if the 
cricket were under a stone. An examination of other causes of variation, 
in addition to temperature, involves consideration of three or naore vari- 
ables, a procedure for which will be considered in Chapter 21 under the 
heading of ^'Multiple Correlation.” 

The closeness of the relationship may be expressed in general terms by 
stating that the coefficient of correlation, r, is -1-0.9919. Since ± 1 .0 is per- 
fect correlation (see Chart 19.2) and 0 is no correlation (see Chart 19.3), 
it should be obvious that one almost never finds a higher coefficient 
than +0.9919. The plus sign indicates that the correlation is positive — 
that is, that the chirps increase as the temperature increases. Had chirps 
decreased with increasing temperature, the correlation would have been 
negative, or inverse; the sign of r would have been negative, as would the 
sign of b in the estimating equation; and the estimating line would have 
sloped downward to the right. 

An illustration of rather low correlation (—0.11) is given by Chart 19.4. 
In this case, brain weight was estimated by cranial capacity, and legisla- 
tive ability by a rather complicated system of scoring. But even if we 
assume that all measurements are accurate, the evidence certainly does 
not suggest that legislators should be selected solely from head measure- 
ments. Perhaps there are additional factors which account for legislative 
ability; for example, intelligence, education, initiative, honesty, social 
awareness, and other traits are doubtless important. 

CORRELATION THEORY 

Correlation may be thought of as involving three types of measure- 
ments, which may conveniently be made in the following order: 

(1) An estimating, or regression,^ equation which describes the functional 
relationship between the two variables. As the name indicates, one 
object of such an equation is to make estimates of one variable from 
another. 

(2) A measure of the divergence of the actual values of the dependent 
variable from their estimated or computed values. This measure 
is analogous to a standard deviation and gives an idea, in absolute terms, 
of the dependability of estimates. It is called the standard error of estimate 
(Sy.x)* 

(3) A measure of the degree of relationship, or correlation (r), between 

^ The terra ** regression*’' entered statistical literature as a result of the use of correla- 
tion by Galtou to study biological regression (that is, the tendency to revert to a cora- 
raon type or average). Since correlation analysis is applied to many types of prob- 
lems, the term “estimating** seems more appropriate. 
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the variables, independent of the units or terms in which they were origi- 
nally expressed. The square of this measure (r^) enables us to state 
the relative amount of variation in the dependent variable which has been 
explained by the estimating equation. 

The estimating equation* Foresters sometimes find it convenient 
to estimate the height grov/th of trees from their growth in diameter, since 
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Chsirl 19.4. Estimates of Brain Weight and Legislative 
Abilit} of 09 Members of Congress. Data from Brain Weight 
and Legislative Ability in Congress/* by Arthur MacDonald, Con- 
gressional Record^ April 12, 1932. 


this procedure is quicker than direct measurements of the growth in 
height. The scatter diagram, Chart 19.5, shows the breast-high diameter 
growth and the growth in height of 20 trees, together with the estimating 
line which describes the nature of the relationship between the two varn 
ables. This straight line has been so fitted that the sum of the squares of 
the Y deviations from it is less than those from any other straight line. 
A curve fitted in this manner is usually considered by statisticians to be 
the best with which to estimate values of one variable when values of the 
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other variable are known. The fitting of such a line is similar to the 
fitting of a trend, and requires the use of the following normal equations: 

^ L SF - ATa + bXX. 

"ll. SXF = aSX + hXXK 

It will be remembered that the normal equations were discussed in 
Chapter 12. 


HEIGHT GROWTH 
IN FEET 



Chart 19.5. Breast-High Diameter Growth and 
Height Growth of 20 Forest Trees. Data of Table 19.1. 

Table 19.1 shows the computations that are necessary to determine the 
values which must be substituted. The substitution yields: 

I. 173 = 20o + 90.7&. 

II. 856.0 = 90.7a + 453.936. 

Multiplication of all the items in Equation I by 4.635 permits us to cancel 
out a by subtracting Equation I from Equation II. Thus 

II. 856.0 - 90.7a + 463.936. 

(I X 4.535). 784.555 = 90.7a + 411.32456 . 

71.446 = 

6 = 1.676896. 


42.60556. 
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We may now substitute the value of h in Equation I in order to find the 
value of a. 

1. 173 ='20a + 152.094467. 
a = 1.045277. 


TABLE 19.1 

Determination of Values Used in Computing Estimating Eqiia<" 
tionfor Growth in Diameter and Height of 20 Forest Trees 


Rank in i 
diameter I 
growth 
(smallest 
to largest) 

Diameter 
growth at 
breast height 
in inches 

X 

Height 

growth 

in 

feet 

F 

XY 

X2 

ys 

1 

2.3 

7 

16.1 

5.29 

49 

2 

2.5 

8 

20.0 

6.25 

64 

3 

2.6 

4 

10.4 

6.76 

16 

4 

3.1 

4 

12,4 

9.61 

16 

5 

1 3.4 

6 

20,4 

11.56 

36 

6 

i 3.7 

6 

22.2 

13.69 

36 

7 

1 3.9 

' 12 

46.8 

15.21 

144 

8 

1 4.0 

8 

32 0 

16.00 

64 

9 

^ 4.1 

5 

20 5 

16.81 

25 

10 

4.1 

7 

28.7 

16.81 

49 

11 

4.2 

8 

33.6 

17.64 

64 

12 

4.4 

7 

30.8 

19.36 

49 

13 

^ 4.7 

9 

42.3 

22.09 

81 

14 

' 5.1 

10 

51.0 

26. Dl 

lOO 

15 

5.5 

13 

71.5 

30.25 

169 

16 

5.8 

7 

40.6 

33.64 

49 

17 

6.2 

11 

68.2 

38.44 

121 

18 

6.9 

11 

75.9 

47.61 

121 

19 

6.9 

16 

110.4 

47.61 i 

256 

20 

7 3 

14 

102.2 

53.29 

196 

Total 

90.7 

173 

856.0 

453.93 

F705 


Data from Donald Bruce and F. X. Schumacher, Forest Mensuration^ p, 124, 
McGraw-Hill Book Company, New York, First Fdition, 1935. Courtesy of Publisher 
and Authors. 


The values for a and h are checked by substituting in Equation II. While 
this does not prove that no errors in computation have been made, yet 
if the correct numbers were substituted in the two normal equations, 
either no errors, or counterbalancing errors, have been made. Since 
a — 1.045 and h =s 1.677, the equation of the line which enables us to 
estimate the growth in height of trees in this particular forest when their 
growth in diameter is known may be stated as 


F. - 1.045 + 1.677X. 


Suppose now we wish to estimate the height growth of a tree which 
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grew 5.5 inches in diameter. Substituting in the equation, we have 

Yc = 1.045 + (L677)(5.5), 

= 10.268 feet. 

Dependability of estimates. However, we should not expect all 
trees which grew 5.5 inches in diameter to have grown exactly 10.268 feet 
in height, for the dots of the scatter diagram do not all lie on the fitted lino. 
Rather, 10.268 should be thought of as an estimate of the avenige height 
growth of all trees of the diameter growth indicated. We should expect 
variations from this value the same as from the arithmetic mean of a 
frequency distribution. It is therefore pertinent to inquire v/hat propor- 
tion of trees may be expected to fall within any range of error in which 
we may be interested, assuming, of course, that we have a representative 
sample. 

To do this, it is neces.sary to compute the standard deviation of the 
values, not from their mean, but from the line of estimation. On Chaid 
19.6, the vertical distance from the line of estiinale lo any }" value repre- 
sents the difference between the observed Y value and the estimated Y 
value. The estimated Y values, Yc, are obtained by solving the estimat- 
ing equation for each measurement of diameter growth, or X value. The 
deviation Y — Ye represents the error that would have been made in one 
particular instance. To obtain a summary measure of those deviations, 
they maj’' be squared, summed, divided N, and the square root 
extracted. This is the standard error of estimate, ^ the symbol for which is 
Its formula may be written 

IY(Y~~Yy^ 

Sr.x- = V 

In this illustration 

log '7K 

Sy.x = V 4.438 = 2.107 feet. 

Calculations are shown in Table 19.2, Columns 7 and 10. Ordinarily the 
more expeditious method of calculation, which is explained on page 468, 
would be used. The above method is used solely to explain the meaning 
of the measure. 

This measure may be interpreted in a manner strictly analogous to that 
of the standard deviation of a frequency distribution. It fields an 
estimate of the range above and below the line of estimation within which 

* Although this measure is called the * ** standard error of estimate/’ it is not a stand- 
ard error in the sense used in Chapters 24 and 25. sy.x is the standard deviation of 

the Y values around the estimating equation a -f- 
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68.27 per cent of the items may be expected to fall if the scatter is normal. 
In practice we frequently think of this measure as the range within which 
about I of the values will be found. For the case in hand (^f.x == 2. 107), 
we may expect to find about f of the items of Chart 19.6 within the narrow 
band ±sy.x shown in the diagram; about 95 per cent (ideally 95.45) 
within the wider band that includes ±2sy,x; and practically all within 
±3 sf.x (theoretically, with a large number of items, 99.73 per cent of the 

HEIGHT GROWTH 

IN FEET 



DIAMETER GROWTH 
IN INCHES 

Chart 19.6. Estimating Equation and Zones of ±1, 
±2, and ±3 Standard Errors of Estimate, for Diame- 
ter Growth and Height Growth of 20 Forest Trees. 
Data of Table 19.2. 


eases). A count of the dots shows that within ±sr,x of the line of esti- 
mate, 13 of the 20 items (65 per cent) are found; within ± 2st.x of the line, 

19 of the items (95 per cent) appear; and within ±3 sf.x are included all 

20 of the items. The slight discrepancies may have been due to the fact 
that the sample was small and the scatter not normally distributed 
around the estimating equation. 

Although the standard error of estimate is a measure of the dispersion 
of all of the Y values around the estimating equation, and is therefore a 
general or over-all measure of dispersion, it is nevertheless often used to 
indicate the dependability of specific estimates. It was calculated that 
trees with growth in diameter of 5.5 inches should average 10,268 feet 
in height growth. We may now amplify the statement by saying that, 




TABLE 19,2 


460 


CORRELATION I [CuAt 



! 

Deviations ’ Squared deviations 

«>* 

11 

' - 
b 

r 

cstooioo«c<*c«No^r^coeccooooTHioiO'^ 

'-Hcon<uocoC'Sc^i-HwcojN.r^inrHcoooiO'--»«-*oo 

OC^r^eO’^OOCniCOCO'*t'OC>100c£?C40^itOO 

‘n<cOOO>0!Orf<OtOOOOOOf--<rr'C'lt~(OTf'iO 

'■^i-.,-^ii:!0’^c50odooc^oor^’^oo^»-<o 

r-( ^ 

00 

Ttl 

to 

t'* 

00 

(» 

dM 

11^^ 

1 S. 

U 

b 

lO t- o 00 o Cl o ^ r- r- r- o c; CO w 00 

l>.t---O00’MO''tn'»-^'-^'-«»-HO00t>-C4L'0OCnr-» 

rr^ -H CO 00 CNt CO 00 O CO CC *0 i'- cr- rH O os? CNI 04 o 
O b? »0 !> CO Ci r-* tfj lO LO CO O O OO O »0 0* b- l> iO 

■r}<^Ou0C0»-Hr--4OOOOOOOCNl"?t<l>i0in»--f 

<r^ rH t-H 05 

IT) 

oo 

o 

CO 

d 

fPM-4 

«N 

11 

i 

SS! 

lO >0 uO) iO lO >0 lO iO tc lO lO *0 lO LO ic «bl to to LO »c 

(N CM <M Cl Cl 01 CM 01 Ol C3 04 01 CM 01 Ol Cl CM 01 Cl Cl 

CM 05 01 Cl Cl C5 Cl CM 01 Cl 01 Ol Cl Ol Ci C'l 01 Cl Cl CM 

f>-'*rcotDOOCM'Ccoi'-«’^^i>-’-*^ooarfi-’iOooo 

01 d ^ S-- f - d CO 05 O 01 O r-n CO CM lO UO 0C> 

CM C5 r~< »— i T--t LO OI 

208.5500 

X 

11 , ^ 
^ 

00 <N lO 1- O lO r- ^ -H CO -tf CO 01 CM O'! 01 ec 'C ro 
C100*C 'J'lO^Tt^CMCMOOCMr^OCOI'^'^t^CO"^ 
Ob.’^CMl'-OI’^CMCflOlO'I^OTrb-b.'^^OcO'S'- 

CMOlT-^CitOi-t'^OCMdd^OOC'lCOOrHcOO 

i i 1 1 MM I ! 1 

o 

o 

d 

1 

fjx, 
i! , 

V ' ^ 

CO 01 to ® CO O lO 1- CO Oft Cl *«S I'* OC -/} Cl Ol O O l>» i 

'rf< o O o CO Cft 01 Cl O 01 1-- T — 01 Cl CO CO CO 

1N« ■rt> oi Cft O 00 f>- r-'- iO CM 01 C- CO !'-• Cft O CO j 

COC'OCOCMt-ir-irHOOOOOOOr~tO'lCMCOCOTri 

1 1 1 1 1 1 1 1 1 1 1 1 

rH 

o 

d 

' IS 

lo u:» lO LO »o to iO lo lO to »o to to lo »o tc lo «o to lo 
cococoocococccocoocoocococoococccoec 

r-id’dTf<CMCM?odcO»--^di-MOt-iTt<rHCMCMI'^iO 

M M M M M 1 1 

o 


0500*0*^b*OtOC0^t-t00'rHe-000001CMCOc0l> 

OCOO'^-»t'iOOOt001CMCOCMCMCftcOl'«-TH»-4r-toO 

OlCM^CMlNCMiOt^CftOaO'chastOCMb-'^OCDCM 

^ddddi>r-i>*r«i>QOODoocodd'“fCMCMco 

^ tH rM Y>»< 

173.004 

Height 
growth 
in feet 

Y 

(3) 

t^C0'^’^®<»CM00»Ol'*00t^CftOC01>^i-lC0’^ 

^ ^ pn4 |m^ ^ ^ )>m( 

CO 

w 

Diameter 
growth at 
breast 
height 
in inches 

X 

(2) 

|cOtOCO»-4'i!tib'COO^--»CM’-t<l>^iOGOCMOCOCO 

|dcMc4c^COCOCO-^*¥4HTjr<Tr-«:i^-^iiOtOiOCD<OCOr«. 

i 

i 

b* 

d 

OS 

Rank in 
diameter 
growth 
(smallest 
to 

largest) 

(1) 

[ 

r-tCMC0‘«t*i«<»fc^<?0C»Ot-4CMC0^»0<0r«00C»O 

1 




Chap. 19| TWOA'ARIABLE LINEAR CORRELATION 461 

if our sample is representative, about | of such trees should vary in height 
growth between S.i6 feet and 12.38 feet (10.268 ± 2.107) ; or, considering 
a slightly wider range, about 95 out of 100 should lie between 6.05 feet 
and 14.48 feet. The proportion lying within any other range could 
readily be computed also by referring to Appendir E. 

These statements concerning range of error have to do, not with cer- 
tainty, but only with expectation. We have used only 20 items, and, 
even though the sample may have been carefully chosen, another sample 
of 20 would not give us precisely the same results as those obtained above. 
It might be that we could reduce uncertainty further, not only by increas- 
ing the size of our sample, but also by comparing variations in height 
growth with some other factor in addition to diameter growth — for 
example, age, since as trees grow older their rate of growth may change. 
Also, the character and quantity of plant food in the soil and the degree 
of crowding of the trees might be considered. Even if several factors in 
addition to diameter growth were considered (this is multiple correlation, 
discussed in Chapter 21), there would still be some unexplained variations, 
and therefore still some uncertainty. 

The correlation coefficient and explained variation. Another 
measure closely related to the estimating equation and to the standard 
error of estimate, is the coefficient of correlation r. The estimating equa- 
tion Fc == a + bX is a statement of the way in which the dependent vari- 
able changes with variations in the independent variable. Sr.x is an 
indication of the amount of dispersion in the dependent variable which we 
have failed to account for by our line of estimation, but it is stated in 
terms of the original data — in the case of the diameter-growth and height- 
growth data, in feet. When stating the degree of relationship between 
two variables, it is convenient to be able to employ concise numerical 
terms which are independent of the units of the original data and to ex- 
press the degree of relationship between two series even if we do not know 
either the equation of the line of estimation or Sr.x. To be sure, some- 
thing is lost by so compressing the information, since it does not enable 
us to make an estimate of the value of one variable from the other, or 
to tell, in absolute magnitude, the degree of accuracy of any estimate 
we may make. But something is gained, too, since one coefficient can be 
compared with any other, regardless of the subject matter of the different 
correlations. As has been stated, the coefficient of correlation is a num- 
ber varying from +1, through zero, to — L The sign indicates whether 
the slope of the line of relationship is positive or negative, while the 
magnitude of the coefficient indicates the degree of association. When 
there is absolutely no relationship between the variables, r is 0. 

A clear understanding of the meaning of the coefficient of correlation 
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is given by the following approach. One measure of variability, called 
variation or total variation, is the sum of the squares of the deviations of 
the Y values from their mean, S(F — F)l This total variation can be 
broken up into two parts: (1) that which has been explained by our line of 
relationship, and (2) that which we have failed to explain. The total 
variation in height growth of the trees of our distribution, as indicated 
by the calculations in Column 8 of Table 19.2, is 208.55. The amount of 
variation which we have explained by our line of relationship is the sum 
of the squares of the deviations of the estimated F values from their own 
mean (which is also the mean of the original F values, as may be seen by 
dividing the totals of Columns 3 and 4 of Table 19.2 by N),^ that is, 
S(Fc — F)^. The explained variation is shown in Column 9 of Table 
19.2 to be 119.81. The unexplained variation is the sum of the squares 
of the deviations of the F values from their estimated values, S(F — 
The unexplained variation is shown in Column 10 of Table 19.2 
to be 88.75. 

Let us summarize our findings: 


Variation 


Symbol and formula 


Amount of 
variation* 


Per cent of 
total variation 


Unexplained 5)^ « 5)( F ~ F.) « 88.75 42.6 

Explained Si/’ = S(Fc - F)^ 119.81 57.4 

Total = S(F - F)2 208.55 100.0 


* Because of rounding in Table 19.2, the two components slightly exceed the total. Later it will be 
seen that Sy* *= 88.74. 


It will be seen that we have explained 57.4 per cent of the variation 
in the dependent variable. Expressed as a ratio to one, 0.574, this is the 
coefficient of determination, rl The coefficient of correlation, r, is the 
square root of the coefficient of determination and has a value of 4*0.758 
(the sign being the same as that of h), and may be thought of as the 
square root of the proportion of the total variation in the dependent 
variable that has been explained by use of the estimating equation, r 
will, of course, always be larger than unless r® = 0 or 1.0, when r = 
One outstanding advantage of the foregoing method of explaining the 
coefficient of determination and the coefficient of correlation is that the 
concept will also serve to explain non-linear and multiple coefficients, 
which are discussed in Chapters 20 and 2L 
It may be helpful to some readers to be able to visualize the information 
of Table 19.2. Chart 19,7 shows, for the data of height and diameter 
growth: 


® See Appendix S, section 19.1, Equation 2. 



A. The deviations of the ac- 
tual F values from their 
mean. 

B. The deviations of the 
computed F values from 
their mean. (N ote again 
that Yc = F.) 

C. The deviations of the 
actual F values from the 
computed F values. 

The proportion of variation 
which has been explained was 
0.574. The proportion which 
we failed to explain was 0.426. 

This is the coefficient of non- 

, , . I- ' TtT i i HEIGHT growth )M FEET 

determination. ® N ote that un- ' s 
der all conditions = 1.0. 

Note also that the maximum 
possible value for is 1.0 

(when r is also 1.0); this would ,2 
occur if all of the dots of the 
scatter diagram were on the line 
of estimation, as in Chart 19.2. s 
If no variation were explained, 

(and r) would be zero, since 
the estimating equation would 4 
coincide with F. ^ 

As can be seen from Table 
19.2, or from the summary of 
findings, total variation equals 

, , , . , . , height growth im feet 

explained variation plus unex- la 
plained variation:^ 

Sj/2 = Sj/I + St/*; 

208.55 = 119.81 + 88 . 75 . 

12 

® While r“ + = 1,0, r + fc > 

±1.0 unless r = ±1.0 or 0. A: is called *0 
the coefficient of alienation. 

^ For algebraic proof, see Appendix ® 

S, section 19.1, Equation 7. 




2 3 4 5 6 7 a 


diameter growth in inches 
3. EXPLAINED DEVIATIONS 


Height growth in rcET 



diameter growth in INCHES 
A. TOTAL DEVIATIONS 
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The eciuatioii may also be written 

M - ^yl 

As computed in the preceding paragraphs, 



but wc can also writc^ 




22/^ - St/; 


St/ 



8875 

208.55 


1 - 0.426 = 0.574, 


which is the same value obtained before. . 

It was mentioned parenthetically, on page 462, that the sign of r is the 
same as the sign of h in the estimating equation. The sign of r can also 
be determined from inspection of the scatter diagram, unless the correla- 
tion is very low. The methods previously described for determining the 
value of or r were presented to explain the meaning^ of the coefficients. 


* Taking the square root gives the correlation coefficient: 



Keference will be made to this last expression later in the chapter. 

^ The correlation coefficient may also be explained in this manner: If the two 
variables X and Y are thought of as being composed of elements equally likely to be 
present in any item (some of which are common to X and F, but some of which occur 
in the one and not the other), then the coefficient of determination of the entire 
population is the product of the two proportions of common elements, and the coeffi- 
cient of correlation is their geometric mean. Let us take 5 disks (elements) marked 
on one side as follows (the other side being blank) : 



If we should throw all 5 disks into the air, when they fall, any number of X's from 0 to 4 
might appear, and also from 0 to 3 F’s. Whenever an X appears, the chances that a 
F will also appear on the same disk are 2 out of 4; likewise, whenever a F appears, the 
chances are 2 out of 3 that an X will appear on the same disk. If we should throvr 
the^e disks into the air a number of times, counting the X’s and F^s each time, there 
would be correlation between the number of X^s that appear from throw to throw and 
the number of F^s. The m ost likely value of is t X I * 0.333, while the most 
likely value ofris*\/lXf*= +0,58. The larger the number of throws, the greater 
will be the tendency for r to approach this value. For a demonstration of this, sec 
F. E. Croxton and D. J. Cowden, Practual Bminess Btatutics^ Prentice-Hall, Inc., 
New York, 1934 (first edition), pp. 410-419. 
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They are too laborious to employ for day-to-day computations. Other 
formulas, more useful for purposes of calculation, will be given further 
on in this chapter. 

The prodiict-momeiit formula. The coefficient of correlation may 
be approached from a number of different points of view. As noted 
before, the explanation already given is particularly englightening, since 
essentially the same idea can be applied to curvilinear and to multiple 
correlation. But the following explanation is also simple and, for certain 
purposes, extremely useful. 

In the estimating equation, b tells us the normal amount by which the 
dependent variable changes with a change of one unit in the independent 


y 

variable. It is the slope or - ratio of any point on the estimating eqiia- 

tion, when y and x are defined as deviations from the mean of the series, 
so that the estimating equation becomes ye = hx, and h is obtained by 
'^xy 

finding^® the value of ‘ 2 ^* Although this constant h is essential for 

purposes of estimation, still it cannot tell us the degree of relationship 
between the variables, since they are not directly comparable with each 
other. The X series and the Y series do not have the same dispersion, 
and they may even be in different physical units. However, compara- 


y 

bility between the terms of the ratio ~ can be obtained by dividing the 

X 

numerator by sy and the denominator by 8x or by dividing the entire 


expression by — Thus, b is transformed into r as follows 
Sx 


2xy ^ £f _ 'Sxy ^ sx _ (2xy)(sx) _ 'Sxy _ I/xy 

Sx Xx^ Sr NsxSr 's/ Xx^Xy^ 


See Appendix S, section 19.2. 

u Another way of getting the same result is to think of r as a special case of b; 
namely, when the original data have been made comparable, by expressing them in 
units of their own standard deviations. Thus, 


Hwy 

becomes 



Xxy ^ Xxy sf 2xy 
sxsy Xx^ sx^Y Xsxsr 


The formula is often stated as r = The reason for the adjective 

produet-fmment becomes clear when it is realized that the word moment refers to the 
average of some power of the deviations from a mean. Thus, r is the first moment of 
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In either of the last two forms, the ratio is known as the product-^moment 
form of the coefficient of correlation. Thus it may be seen that r is 
merely the slope of the estimating equation when both numerator and 
denominator are in standard deviation units. 

Now, since 


r = 5 -r- 


Sx 


and 


b == 


Sy 

r — > 
Sx 


Sy 


Use of the estimating equation in this form will be made later in the 
chapter. 


PRACTICAL METHODS OF COMPUTATION 

The previous illustration involved a limited number of paired items in 
order to illustrate the theory of correlation as concisely as possible. In 
most practical problems, however, we have a large number of pairs of 
items. In practice, therefore, it is advisable to modify the foregoing 
methods slightly in order to save time. 

As a preliminary step in a correlation problem, a scatter diagram 
should always be drawn. If only an approximate idea of the degree of 


the product of the variables when each has been previously stated in terms of its own 
standard deviation. For proof that 


Sy* 

see Appendix S, section 19.3. 

No previous mention has been made of the estimating equation Xc ** a' + 
which minimizes the squared horizontal deviations. For this equation, the normal 
equations are: 

I. SX « Na' + 6'SF, 

IL 2XY » a'SF + h'H^YK 


In the form Xc 


¥ » 


hxy Bx 

and Xc = r 7 “ y. 

sy 


In the portions of this text dealing with linear correlation, we shall give exclusive 
attention to problems involving the estimating equation F^ * a + bX. There are 
situations in which the estimating equation Xc » a' -h¥Y is appropriate and still 
other situations calling for estimating equations differing from either of these. For a 
discussion, see ^^One Line or Two,'’ by W. N. Jessop, Applied SiatisticSi A Journal of 
the Royal Statistical Society^ YoL I, No. 2, June 1952, pp. 131-137. Eight references 
are given at the end of this article. 
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relationship is required, inspection of the scatter plot yields satisfactory 
results. After a little experience in correlating, the statistician may be 
able to make surprisingly close estimates of r, by inspection, from the 
scatter diagram, and these may be good enough to help him to detect 
gross mistakes in computations of r. The scatter diagram may fre- 
quently be used for exploratory purposes and may occasionally yield 
sufficient information to eliminate the need for determining the coefficient 
of correlation. 

We have already seen that 

'Lxy 
^ 2a:*' 

Since the first normal equation is 

SF - Wa + feSX, 

— — a 5 . > and 

N N 

a=Y - ht. 


From these expressions, o and h may be obtained without solving the two 
normal equations simultaneously. We must, however, compute:^* 


1 = 


90.7 

20 


4.535. 


r.^.8.65. 

20 


Hxy = 2XF - X2F, 

= 856.0 - (4.535)(173) = 71.445. 
2a:* = 2Z* - 12Z, 

= 453.93 - (4.535) (90.7) = 42.6055. 
= 2F* - F2F, 

= 1,705 - (8.65)(173) = 208.55. 


The last summation mil be needed later. 
Then we obtain 


Hixy _ 71.445 
^ - 42.6055 


1.676896; 


a = F - hX = 8.65 - (1.676896)(4.535), 
= 1.045277, 


For proof of the expressions for the snmmations, see footnote 3 in Chapter 21 
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giving the estimating equation 

y, = 1.045 + 1.677X. 

Next we compute 'SYl by use of the expression^* 

sy® = asy + hsxy, 

= (1.045277) (173) + (1.676896) (856.0), 

= 1,616.26, 

and hyl from 

- sy.^ 

= 1,705 - 1,616.26 = 88.74. 

We may compute either 

Sy? = aSy + hsxy - FSF, 

= (1.045277)(173) + (1.676896) (856.0) - (8.65)(173), 

= 119.81, 


Syf = blixy, 

= (1.676896) (71.445) = 119.81, 

and obtain 'Ey] from the alternative expression 

"Eyl = Sy* — Sy®, 

= 208.55 - 119.81 = 88.74. 


A convenient formula for obtaining sj-.x is 


si T 


88.74 


= 4.437 


0.574, 


and 

sr.x - 2.106 feet. 

The coefficient of correlation is then obtained by the usual expression 

^ Sj/2 208.55 

and 

r = +0.758. 

“ 14 Proof that SF* = aSv + is given in Appendix S, section 19.1, Ei^ation 2 

P Iffi Sv* - SF» - SF! is given in the same section, Equation 5. For pmof 
Proof that ^ see Eouation 

that Syf = hZxy, see Equation 6. For proof that Sy, ^y Va 

7, 
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If preferred, r may be obtained by use of one of the expressions given in 
footnote 8. 

If all that is wanted is the value of r, it is most expeditious to make use 
of a formula which does not call for the value of a or 6. It has previously 
been noted that 

'Zxy 

f = * 

NsxSy 

By substituting X — X for x and F — F for y and simplifying, this 
becomes 

XSXF - (SX)(SF) 

^ ° V'[iySX2 - (SX)*][ArSK2 - (27)=!]' 

Entering the necessary values from Table 19.1 gives: 

(20) (856.0) - (90.7) (173) 

^ “ V[(20) (453.93) - (90.7)2][(20)(1,705) - (173)2]’ 

= +0.758. 

Note that this expression automatically supplies the sign for r. 

SOME CAUTIONS 

Correlation and causation. The coefficient of correlation must be 
thought of, not as something that proves causation, but only as a measure 
of co-variation. Any one of the following situations may, in fact, obtain : 

1, A variation in either variable may be caused {directly or indirectly) by 
a variation in the other. The variable that is supposed to be the cause of 
variations in the other is usually taken as the independent variable and 
plotted along the X-axis. Thus, because dividends on stocks are thought 
to affect stock prices, rather than vice versa, a ^Mividends^' series 'would 
be made the independent variable. It is a logical process which deter- 
mines the statistician’'s beliet that there is causal relationship between the 
two variables, and his belief as to which is cause and which is effect. It 
must be evident, then, that the coefficient of correlation in itself does not 
say that X causes F, any more than it says that F causes X. 

For derivation of this expression, see Appendix S, section 19.4. Having obtained 
r by the expression above, it is possible to get the estimating equation and from 
tbe formulas used witli correlation of grouped data: 

F. -F=rg(X-I) 

and 

Sr.x ** sy %/ 1 r*. 
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2. Co-variation of the two variables may be due to a common cause or 
causes affecting each variable in the same way, or in opposite ways. If it 
should be found that there is correlation between automobile accidents 
per IjOOO persons and per capita federal income tax payments, it should 
not hastily be concluded that it takes an automobile accident to jar a 
person into paying his income tax; nor is it necessarily true that making 
large tf^.x payments incapacitates a person for driving carefully. It is 
quite possible, however, that in states where the average income is high, 
the per capita income tax will be large, a large proportion of the people 
will own automobiles, and accidents will be numerous. 

3. The caused relationship between the two variables may be a result of 
interdependent relationships. Thus, a high price for a commodity 
stimulates its production; but increased production may increase or 
decrease the cost of a commodity, depending upon the period of time 
under observation^ and whether it is an increasing- or decreasing-cost 
industry, and through the change in cost the price will be affected. 

4. The correlation may be due to chance. Even though there may be no 
relationship whatever between the variables in the universe from which 
the sample is drawm, it may be that enough of the paired variables that 
are selected may vary together, just by chance, to give a fair degree of 
correlation. Thus it might be found that, in a given group of male 
students, there was positive correlation between the size of their shoes 
and the amount of money in their pockets. Yet it is hard to develop a 
theory as to why this should be so, and the chances are that another 
sample would yield quite different results. In Chapter 26 brief attention 
will be given to measurement of the reliability of r* 

Heterogeneity,^® In observational data, heterogeneity in a fre- 
quency distribution may often be spotted by bi-modality or the presence 
of a few items which are too far out of line with the other items to be 
considered a matter of chance. On the scatter diagram, such hetero- 
geneity may show up as a tendency for the dots to cluster into two or more 
groups, or for one or more dots to be far removed from the others on the 
chart. Where heterogeneity is observed, it is better to classify the data 
on some rational basis and correlate each group separately. Individual 
items clearly governed by a different set of causes should be eliminated 


In the folinwing paragraphs the material dealing with heterogeneity is based on 
a^discussion of the same topic in F. E. Croxton and D. J. Cowden, Practical Buaimss 
8tathtica, Second Edition, Prentice-Hail, Inc., New York, 1948, Chapter 14; and in F, E. 
Croxton, Elewmiary Staiutics with Applicatiom in Medicine, Prentice-Hail, Inc., New 
York, 1953, Chapter 6. Charts 19.^ 19,9, and 19.10 are also from the latter book. 
The treatment of errors of measurement, use of averages, non-linear relationship, 
and-ellniiaation of relevant data is based on similar material in the former volume* 
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before correlating. If these common-sense steps are not taken, one may 
obtain a misleading impression, not only as to the degree of correlation, 

but sometimes even as to its sign. 

Chart 19.8A is an illustrative scatter diagram showing low correlation. 
In Chart 19.8Bj the two component groups are shown by means of differ- 
ent symbols, and it is seen that two fairly high correlations are present. 
It is also possible that two different groups, each having little or no 



X values X VALUES 


Chart 19.8A. Illustrative Scatter Chart Same Scatter Bla- 

Diagram Showing Low Correlations gram as in Cliart I 9 . 8 A 9 But Indi- 
Two Dissimilar Groups Not Idienti- eating Fairly High Correlation for 
fied. From F. E. Croxton, Elementary Each of Two Dissimilar Groups, 
Statistics with Applications in Medicine^ Shown by Crosses and Dots. From 
Prentice-HaH, Inc., New York, 1953, p. the same source as Chart 19. 8A. 

128. 

correlation, could be so located on a scatter plot that, if they were com- 
bined, moderate positive (or negative) correlation would appear to be 
present. 

Another sort of heterogeneity is shown in Chart 19.9, There are nine 
clustered dots in Chart 19.9 which show low correlation, r = +0.32, and 
one dot far removed from the others. For all ten dots, r = +0.79. The 
presence of a single, almost certainly non-homogeneous (or, at least, 
non-comparable) observation such as this may result in an even higher 
correlation coefficient when little or no correlation exists for the other 
observations. It is altogether possible that Chart 19.9 illustrates also 
the sort of heterogeneity mentioned in the preceding paragraph; the 
upper four dots of the cluster of nine may represent a category different 
from that represented by the lower five dots. In any event, the investi- 
gator should look into that possibility. 
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It should be fairly obvious that the reverse of the situation shown in 
Chart 19.9 might also occur. That is to say, a cluster of dots might show 
high correlation, but one extreme dot might be so located that its inclusion 
with the others would result in low 


correlation. Chart 19.10 shows a 
situation in which a low correlation 
is made even lower through the 
inclusion of an extreme pair of 
values, r is decreased from 
+0.348 to +0.290. 

Errors of measurement. 
Since errors in the measurement of 
the two variables are ordinarily not 

Y VALUES 

I ’ 5 ' ' I 


I ^ J 1 I I 

X VALUES 

Chart 19.9. Scatter Diagram 
Illustrating a Type of Hetero- 
geneity. The correlation is increased 
because of the presence of an atypical 
item in the upper right corner. This 
chart is drawn from actual data, the 
source and nature of which are with- 
held. Chart from page 129 of the 
source given below Chart 19.SA, 
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Chart 19.10. Scatter Diagram Il- 
lustrating a Type of Heterogeneity. 
The correlation is decreased because of 
the presence of a possibly atypical item 
at the top of the chart. The data repre- 
sent the I.Q/s'of 26 fraternal twins 
unlike sex and are from A. H. Wingfield, 
Twins and Orphans, J, M. Dent and 
Sons., Ltd., London and Toronto, 1928, 
pp. 121-12^ Chart from page 111 of the 
reference given below Chart 19.8A. 


correlated, such errors reduce the size of r below its true value. Such 
attenuation can be corrected if the magnitude of the errors is knownd^ 
Use of averages. If the data to be correlated are first grouped into 
^number of size groups according to the independent, variable, if X and 
F are computed for each group, and if these means are correlated, the 


See J, P. Guilford, Fnndammtal Statistics in Tsychohgi/ and Edumiian, McGraw- 
Hill Book Co., Inc.. New York, 1942, pp. 2S7-288. 
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correlation among the means will be higher than among the individual 
items taken as a whole (unless r == 1.0 for the ungrouped data). This is 
so because there is now no dispersion of the actual values around the 
various column means. Likewise, if the grouping and averaging is done 
for a number of rows of the dependent variable, the correlation will be 
increased. If the data are grouped according to both variables, so that 
there is a number of cells, and if X and V are computed for each cell and 
these paired cell means (rather than their mid-values) correlated, the 
correlation will be increased. The increase will be unimportant provided 
there is a large number of cells. As an illustration, the correlation of 
state averages will ordinarily be higher than that of the county values. 

Noii4iiiear relationship. If inspection of the scatter diagram 
reveals that a curved line could more appropriately be fitted to the data 
than a straight line, r is a misleading measure, understating the closeness 
of the relationship. A curved line should be fitted, and a coefficient of 
non-linear correlation should be computed, following the procedure 
explained in Chapter 20. So doing will yield a higher coefficient and one 
which reflects more accurately the closeness of the relationship. Some- 
times it may be better to transform one or both of the variables into loga- 
rithms, reciprocals, or some other function before correlating. 

Elinamation of relevant data# For instance, if retail sales and pay- 
rolls are correlated for cities ranging from 100,000 to 500,000 population, 
the correlation will usually not be so high as if cities from 10,000 to 
5,000,000 are included. This is so because retail sales and pa^Tolls are 
both positively correlated with population; and, when the range of values 
along both axes is extended, is increased without a corresponding 
increase in Sy®. For data of this type, one must remember to guard 
against heterogeneity of the type illustrated in Chart 19.10. Consider 
also a different situation: if placement scores were correlated with 
monthly earnings for workers having two to five years’ experience, the 
correlation might be higher than if all employees of this type were 
included, since earnings generally vary directly with experience, while 
placement scores are not necessarily correlated positively with experience. 

CORRELATION OF GROUPED DATA 

When the number of pairs of items to be correlated is large, time is 
saved if the data are grouped before calculations are undertaken. First 
the data are tallied/® as in Table 19.3, which shows the relationship 
between per cent of farms with value of products sold of more than $4,000 

Sorting, Instead of tallying, may be easier and less subject to error. This la 
particularly true if the data are on cards or if punch-card equipment is available. 
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and per cent of farms with automobiles, by counties. This table resem- 
bles a scatter diagram except that each point, instead of being plotted 
exactly, is merely entered in the appropriate cell. Thus, a county with 
5 per cent of farms having value of products sold of more than $4,000, and 
with 25 per cent of farms having automobiles, %vould be tallied in the 
extreme lower left corner. 

TABLE 19.3 

Tabulation of Per Cent of Farms with Value of Products Sold of More 
Than $4^000 and Per Cent of Farms with Automohles^ for a Sample 
of 169 CountieS) 1950 

(See text, below, for description of population and method of sampling.) 


Far OABt or ftra* 
vith nutoMciblXvs (T) 


! 93.5-99.9 








IB^ 




iimini 

lEBBBI 




wm 


mgi 

warn 

mm 


mm 


mm 

eo.5-86,9 

WEti 


m 

in^i 






wm 

mm 


74.o-eo.Jt 



lifeli 




iSil 



mm 




lEMggRl 

mm 




mm 


HISI 


' 0 


HHI 

HHI 

61.0-67.4 

Mgai 

' 0 

il^ 

Q 







HHI 

■m 

54.5-60.9 


m 



li^ 




mm 


HHi 

hh8 




m 

ligl 




wm 





41.5-47.9 

mm 

wtitk 

wm 

■■■ 

mm 








35.0-41.4 





wm 








28.5-34,9 


BHH 



IHHI 








[jllQIQlIQI 

'mm 






HBI 

■Hi 





PlHi 

wm 





E&SS3E 

ilSSESEI 


iSB 

WESEi 


mSSE 


For ««iKt «r fonw vitli tiIim of proAuots told ovar 1^,000 (X). 


Data from Country Gentleman’s Farm Market Data Book, The figures are based on the 19fi0 Census 
of Agriculture. 

The following states, and those south of them, were excluded from the 
population sampled: Oklahoma, Arkansas, Kentucky, West Virginia, and 
Virginia. It was believed that these states were affected by a system 
of causes different from the other states. For the same reason, the 
District of Columbia was also excluded, and all counties included in 
‘^Standard Metropolitan Areas'' by the Bureau of the Census. The 
sample was obtained by the following procedure; The states were listed 
in alphabetical order, and also the counties within each state. The 
counties of ah states (including those mentioned above) were then num- 
bered, from 1 through 3,070. A digit, selected at random, turned out to 
be 5, and all county numbers in the population being dudied with a ter- 
minal digit of 5 were selected. If a county so selected was a metropolitan 
area, the county with the number closest to it was substituted. This was 
taken, arbitrarily, as a county with a terminal digit of 6 rather than 4 
where a choice had to be made. We thus have a 10 per cent systematic 
sample, stratified by states, with approximately proportionate repre- 
sentation of counties for each state*^^ 

^ k mere laborious, but sl%btly better, plan would be to use systematic sampling 
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Table 19.4 is a correlation table. The figures in the center of each cell 
are taken from Table 19.3. The /r values are obtained by adding the 
numbers horizontally; the fx values, by adding vertically. These two 
sets of figures will be recognized as frequency distributions of the depend- 
ent and independent variables, respectively. The total frequencies, or 
counties N, for each distribution are, of course, the same: 169. The three 
other columns and rows in the table are identical with those to which 
we are accustomed for computing the mean and standard deviation from 
a frequency distribution, except that here we have two frequency distri- 
butions, one of the X values (running horizontally) and another of the Y 
values (running vertically). For ease in computation, deviations are 
measured m terms of class intervals from assumed means, that of X 
being chosen as 8 per cent and that of Y as 6.5 per cent. 

Since xy values are required for r, these also are computed for each 
cell and totaled. This is done by multiplying the X deviation by the F 
deviation (shown in the upper part of each cell), and finally multiplying 
this product by the appropriate frequency. The results are shown in 
boldface type in the lower part of each cell. It will be noticed that the 
first and third quadrants are positive, while those in the second and fourth 
are, of course, negative. The algebraic total of these products is shown 
in the lower right-hand corner of the table. There is no subscript for / in 
the expression S/dxdy, since each cell frequency is common to an X class 
and to a F* class. 

When correlating grouped data, it is most expeditious to compute r 
first, after which the estimating equation and the standard error of 
estimate may be obtained. 

To obtain r directly from ungrouped data, the following formula was 
used: 


NZXY - (SX)(SF) 

VlNSX^ - (SX)"][iVSF2 - (SF)*]’ 


For grouped data, X is replaced by and Y by dy, the symbol / is intro- 
duced, and the expression becomes 


ivs/dX - (S/^x)(S/X) 

- {Xfyd'^ywxfAd'rr - csfxn 


with probability proportionate to size. Size would be measured by the number of 
farms in a county. The variability in the number of farms per county is not sufficient, 
however, to make this more laborious procedure worth while. 

It is, of course, possible to set up the two norma! equations'and obtain the esti- 
equation first. For the method of doing this, see the first edition of this teasti 

p* 675 and pp. 85e"S57. 




TABLE 19.4 

Carrelation Table of Per Cent of Farms with Value of Products SoM of More Than 
$4^,000 (X) and Per Cent of Farms with Automobiles (Y) 9 for a Sample of 169 

Counties^ 19S0 



from TftWe I9.S. 
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Substituting in this formula, we have 

(169)(451 ) - (-48)(183) 

" V'[(169){1,194) - (-48)2][(169)(955) - (183)*]’ 

85,003 

” Vcigg, 482) (127,906)’ 

= +0.5322. 

The following measures are readily computed from values shown in Table 
19.4 by methods already familiar to the reader: 

X = 41.678. F = 77.738. 

sx = 21.144. Sr = 13.754. 

To obtain the estimating equation, we use 

F« - F = r- (X - 1). 

Substituting in this equation, we have 

Y, - 77.738 = 0.5322 ” 41.678), or 

21.144 

Fe = 63.309 + 0.3462X. 

Now since, as shown in footnote 8, 



Sr.x = 4(1 Il!fh_and 

8 y,x = Sf Vi — r^. 

Substituting gives: 

Sr.x = 13.754 V 1 - (0.5322)^ 

= 11 . 66 . 

Effect of grouping. The values obtained from the grouped data 
are not exactly the same as would have been obtained had the computa- 
tions been based upon ungrouped data. Although the difference is 
ordinarily slight if there are at least 12 groups in each direction, the coeffi- 
cient of correlation computed from the grouped data tends to be too small. 
It will be recalled that one formula for the correlation coefficient is 

Sxy 

f S=5 

NszSr 
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The errors from grouping tend to offset each other in the numerator, pro- 
vided the X and Y distributions are approximately symmetrical. How- 
ever, the standard deviations in the denominator tend to be too large, and 
Sheppard’s correction should be used if the conditions under which this 
correction is appropriate are met. 

If the 169 items are correlated, ungrouped r = -+-0.5499 which is, of 
course, higher than the value of r = -+-0.5322 for the grouped data of 

Table 19.4. If Sheppard’s correction is applied (by subtracting from 

each expression enclosed in brackets in the formula for r for grouped data), 
r is found to be +0.5404. Actually, the validity of the use of Sheppard’s 
correction foi* these data is open to doubt, since both series are of limited 
range. 


CORRELATION OF RANKED DATA 

Sometimes statistical series are composed of items the exact magnitude 
of which cannot be ascertained but which are ranked according to size. 
Thus, in Column 2 of Table 19.5, we have listed 11 basketball teams in 
order of their United Press rankings, as of March 2, 1953. In Column 3 
we have listed the same teams in order of their Associated Press rankings. 
The table includes all the teams that were ranked in the first 10 by either 
organization. The U.P. rankings were made on the basis of votes by 
basketball coaches, while the A.P. rankings resulted from preferential 
ballots submitted by sports writers and broadcasters. We wish to deter- 
mine the extent of agreement among the two sets of authorities. 

Since the coefficient of correlation previously explained is not designed 
to deal with ranked data, we shall use Spearman^ s rank correlation coeffi.- 
cientj the formula for which is 

6SD2 

^ N{N^ - 1 )’ 


k which D refers to the difference in rank between paired items in the two 
series. In Table 19.5, it will be seen that the sum of the positive differ- 
ences equals the sum of the negative differences, and thereby provides a 
check on the accuracy of the subtractions. Substituting the values in 
the formula, we have 


Ttsak 1 


6(22) 


(il)(121 - 1) 


+ 0 . 9 . 


The formula gives the sign of the correlation coefficient, positive in this 
case. Whenever there is a tie in rank, the two or more positions should 
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be split among the different items. Thus, had Seton Hall and Washing- 
ton tied for second and third in U.P. rankings, each would have been 
ranked 2.5; while if Seton Hall, Washington, and LaSalle had tied for 
second, third, and fourth, each would have received a rank of 3. 

Two paired series of values are sometimes converted into ranks and frank 
computed to provide a quick estimate of r for the paired values. For 
instance, one might rank American League outfielders according to their 
batting averages and according to their fielding records and correlate 

TABLE 19.5 


Computation of Values for Correlation of Ranked Bata: Basketball 
Team Rankings by Two News Services, March 2, 1953 


Team 

(1) 

Ranking 

Difference : 
Col (2) - 

in Rank, D 
- Col. (3) 

1 

i>2 

(6) 

U.P. 

(2) 

A.P. 

(3) 

+ 

(4) 

(5) 

Indiana 

1 

1 




Seton Hall 

2 

3 


1 

1 

Washington 

3 

4 



1 

LaSalle 

4 

2 

2 


4 

Kansas 

5 

6 


1 

1 

Louisiana State 

6 

5 

i 


1 

Oklahoma A. & M 

7 

7 

j 



North Carolina State 

8 

13 


3 

*9 

Kansas State 

9 

8 

i 


1 

Illinois 

10 

10 




Western Kentucky 

11 

9 

2 


'*4 

Total. 



6 

6 

22 


Data from Durham Morning Herald, March 3, 1953, Section II, p, 8. North Carolina State was 
actually ranked twelfth in the A. P. list, but Oklahoma City (not included above), which was ranked 
eleventh by A. P., was only eighteenth on the U. P. Hat. For purposes of illustration, North Carolina 
state is shown as eleventh on the A. P. list. 

these two sets of ranks. While nank may be computed more quickly than 
r, some time must always be spent in ranking the data. Also, it is well 
to remember that, if one wants only a rough estimate of the degree of 
correlation present, it may be had from a scatter diagram of the original 
values. 

The reason the rank method is not so accurate as the ordinary method 
is that all of the information concerning the data is not utilized. Thus, 
the first differences of the values of the items in a series arranged in order 
of magnitude are almost never constant; usually these differences become 
smaller toward the middle of the array. If such first differences were 
constant, then r and nmh would give identical results, If the values, 
however, are distributed normally, there may be applied to r„8k a correc- 


For proof, see Appendix S, section 19.5. 
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tion which will give the same result that would be obtained by comput- 
ing r directly.^^ These corrections always serve to increase the correla- 
tion; however, they are very small, in no case increasing the correlation 
by so much as 0.02. Furthermore, the correction is not ahvays appro- 
priate. In the present illustration, we have only the upper tails of 
(possibly) normal distributions; if plotted, the data might appear as 
reverse-/ distributions. 

CORRELATION OF DATA IN 2 X 2 TABLES 

Data are often encountered which fall into a dichotomous classification 
on each axis. Sometimes a correlation coefficient may be desired^® for 
such a '^2 X 2^^ table. 

Table 19.6 shows data of the academic rank and academic output of 
36 teachers in a department of a state university in 1951. Is there cor- 
relation between academic rank and academic output, as shown by the 
data of Table 19.6? 

One method of obtaining a correlation coefficient for a 2 X 2 table 
consists of applying the product-moment formula. If we designate the 
values in a 2 X 2 table thus: 


ai 

bt 

Oi 4' hi 

Us 

62 

Uo -k 62 

at ”1" a^ 

bi 4 - 62 

N 


it may be shown that the product-moment formula becomes 

^162 — U261 

V^(ai + bi)(a2 + b2)(cLi + (i2){bi + £>2) 

For Table 19,6 we obtain 

(1Q)(13) - (5)(8) 130 - 40 90 

^ •\/(18)(18)(15)(21) V'l02,060 319.5 


Tables of corrected values of rraok are given in some textbooks. See, for instance, 
R. E. Chaddock, Principles and Methods of Statistics ^ Houghton Mifflin Company, 
Boston, 1925, p, 300 and Appendix E. 

Table 25.6 is a 2 X 2 table for which a correlation coefficient was not desired. 
However, the chi-square analysis discussed in Chapter 25 could be applied to the 
data of Table 19.6. 

The formula given above results from a simplification of the numerator of the 
expression developed in G. U. Yule and M. G. Kendall, An Introduction to ike Theory 
of SiaiisticSf Charles Griffin and Co,, London, 1940 (12th, ed. revised), pp. 252-253. 
The development assumes that only two vali.es are possible for each variable. This 
is true of both variables in Table 25.6, In Table 19.6 it is true of academic rank, 
since the two categories may be thought of as ^‘full professor” and ‘^not full pro- 
fessor.'* It is not true of academic output. 
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This expression will not yield a meaningful sign for r unless the two 
dichotomies are arranged as in Table 19.6, or unless both dichotomies are 
reversed; reversing only one changes the sign. 

TABLE 19.6 

Academic Rank and Academic Output of B6 
Teachers in a Department of a State 
University 9 1951 


Academic 
rank I 

Academi 

High 

c output 
Low 

Total 

High 1 

10 

8 

18 

Low ! 

5 

13 

18 

Total i 

15 

21 

36 


Academic rank was “high” for full professors, “low” 
for all other grades. Academic output was measured by a 
system of points for each of a number of activities, such 
as books written, articles written, papers read, and so forth. 


Another method of correlating data in a 2 X 2 table involves com- 
puting the coefficient of mean square contingency^ C. This is computed 
from the expressions^ 


C = 


{aih2 


[(ui + bi)(a 2 + 62) (ui + Us) (61 + h2)] + — bia 2 y 


which gives, for our illustration, 

. f 


[(10) (13) - (5)(8)]= 


V[(18)(18)(15)(21)] + [(10)(13) - (5)(8)P 
= Vo.073529 = 0.271. 


8,100 


102,060 + 8,100 


The computations do not automatically provide a sign for C, but a sign 
may often be supplied from examination of the data. In this case, it 
would be positive. 

One advantage of the coefficient of mean square contingency is that its 
use is not limited to 2 X 2 tables. It may be used for larger tables, the 
formula for C being that given in footnote 25. 


** This is a modification of the usual expression 


^ ’^In+x*’ 

which makes it unnecessary to compute x® for 2 X 2 tables. Chi-square is discussed 
In Chapter 25. For tables larger than 2X2, the usual expression would be used- 
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A disadvantage of C is the fact that C does not have a maximum value 
of 1.0. Its maximum value is less than 1.0; for example, it is 0.707 for a 
2X2 table, 0.816 for a 3 X 3 table, and 0.949 for a 10 X 10 table. For 
a table having the same number of columns that it has rows, the maximum 
value of C may be had from: 

/number of columns (or rows) — 1 
N number of columns (or rows) 

Corrections^® may be made for this shortcoming of C, but they are not 
wholly satisfactory. 

Various other methods of correlating data in 2 X 2 tables are avail- 
able. Among these are: tetrachoric correlation,^^ the method of unlike 
signs, the cosine tt method,^® and the method of concurrent deviations.®® 

See C. C. Peters and W. R. Van Voorhis, Statistical Procedures and Their M athe- 
matical Basesj McGraw-Hill Book Co., Inc., New York, 1940, pp. 393-399. 

See R. Berber, Statistical Techniques in Market Research^ McGraw-Hill Book Co., 
Inc., New York, 1949, pp. 343-344. 

See the first edition of this text, pp. 688-689. 

See H. O. Rngg. Statistical Methods Applied to Education, Houghton Miffiin 
Co., Boston, 1917, pp. 294-297. 

See H. Secrist, An Introduction to Statistical Methods, The Macmillan Co., New 
York, 1933, pp. 430-432. 



Symbols Used in Chapter 20 


a: value of Yc when X = 0 in the estimating equations Yr = a + bX^ 

= a + bX + cX\ and ]% = a + bX + cX‘^ +- dX^; value of (V'F)c 
when X = 0 in the estinaating equation {'\/~Y)c = a + hX ; value of 

when X = 0 in the estimating equation = a + bX. 



Log a is the value of (log Y)c when X = 0 in the estimating equation 
(log Y)c = log a + X log h and when X = 1 in the estimating equa- 
tion (log Y)c = log a + b log X. 

6: b, or log b, is a constant in the various estimating equations mentioned 
above for a, 

c: a constant in the estimating equations Fc = a + &X + cX^ and Ye — 
a + bX + cX^ + dXK 

d: a constant in the estimating equation Yc = a + bX + oX^ + dX®. 
r] : lower-case Greek eta, the correlation ratio. 
k: the number of columns in a correlation table. 

N: the number of items in a sample. In two-variable linear or non- 
linear correlation, N is the number of pairs of items. 

Nr: the number of items in a column in a correlation table, 
fl: upper-case Greek omega, used to identify a column in the Doolittle 
forward solution, in vhich^the first entries in each section are SF, 
SXF, SX^F, and so on. 

^Y,x- coefficient of determination for X and F. 

coefficient of determination for X and F, the estimating eciuation 
V, a + bX + t’X*’ having been used. 

coefficient of determination for X and Y, the estimating equation, 
Yc - a + bX + cX^ + dX^ having been used, 
rrxsf.x* a measure of (1) the increased variation attributable to the use of 
X^, expressed as a ratio of (2) the amount of \"ariation unexplained by 
the use of X alone. See the coefficient of partial determination, 
explained in Chapter 21. 

coefficient of determination for X and log F. 

^lugr.iojEX- coefficient of determination for log X and log F. 



: coefficient of determination for X ami — • 
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coefficient of determination for X and V F. 
sf.z: standard error of estimate for the estimating equation Fc == a + bX, 
sy^xxti standard error of estimate for the estimating equation Fc == a + 
bX + cXl 

standard error of estimate for the estimating equation Fg = a + 
bX + cX® + dXK 

SiosY.xi standard error of estimate for the estimating equation (log Y)c = 
log a + X log 6. 

Siograoex: standard error of estimate for the estimating equation (log Y)c 
= log a + i> log X. 


Si : standard error of estimate for the estimating equation 




a -f* bX* 

error of estimate for the estimating equation (VT)c = 

a‘+ bX, 

S: upper-case Greek sigma, meaning ‘Hake the sum of/' 

k 

S : a summation over the k columns in a correlation table. 

1 

S: a summation over the Nc items in a column in a correlation table. 

1 

total variation of the F values. 

S(log yy: total variation of the log F values. See footnotes 8 and 9. 

\2 


©• 


total variation of the 


(^) 


values. See footnote 13. 


S(V^)^: total variation of the values. See footnote 10. 

explained variation for the estimating equation F^ = a + &X. 

St/kxx® : explained variation for the estimating equation Fc = a + bX 
+ cX\ 

S^cr.xxsx®- explained variation for the estimating equation Fc == a + &X 
+ aX^ + dXK 

S(log 1 /)^: explained variation for the estimating equation (log F)^ » 
log a + 5 log X or for the estimating equation (log Y)c == log a + 
X log b. See footnote 9. 




: explained variation for the estimating equation 


= a + hX. 


explained variation for the estimating equation' (\/F )o = o 

+ bX. 

"Eyl: unexplained variation for the estimating equation F. = o + bX. 
'^ylr.xx'- unexplained variation for the estimating equation Fc = a + 
M + cZ*. 
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unexplaine.d variation for the estimating equation F*. = a + 
bX + cX^ + dXK 

S(log y)l: unexplained variation for the estimating equation (log F)c = 
log a 4* & log X or for the estimating equation (log F)c — log a +• 
X log &. See footnote 9. 

S : unexplained variation for the estimating equation == a 
+ bX. 

S(Vy)^: unexplained variation for the estimating equation (Vl^)c = 


a + bX. 

X : the X series, also an observed value in the X series. Thus, we refer 
to correlating X and Y, but SX means ‘^sum the values in the X 
series/^ _ 

y: see 'Zy^; y — Y -- Y. 

y^\ see Zyl and Zyl with various additional subscripts. In general, 
(with or without additional subscripts) is the difference between the 
appropriate computed F, or computed transformed F, value and the 
corresponding arithmetic mean. 

y^: see Zyl and Zyl with various additional subscripts. In general, 
(with or without additional subscripts) is the difference between an 
observed F, or transformed observed F, value and the corresponding 
computed value. 

F : the F series, also an observed value in the F series. Thus, we refer 
to correlating X and F, but SF means *‘sum the values in the F 
series.^^ 

F : the arithmetic mean of the F values. 

F<.: when used in connection with the correlation ratio, the arithmetic 
mean of a column. (This symbol was used in the preceding chapter to 
mean the arithmetic mean of the computed Y values, but it is not so 
used in this chapter.) 

log F: the arithmetic mean of the log F values. 



: the arithmetic mean of the values. 


aTf: the arithmetic mean of the Vf values. 
F«: a computed F value. 

(log F)c: a computed log F value: 


(D, 


: a computed — value: 


{“s/ Y)e’. a computed V Y value. 



CHAPTER 20 

Correlation 11: Two- variable Non- 
linear Correlation 


The preceding chapter considered the simplest type of reiationsliip 
between two variables: a constant amount of increase in the dependent 
variable associated with a unit increase in the independent variable. Not 
always, however, is the linear hypothesis satisfactory. The data of 
diameter growth and height growth of the trees, shown in Chart 19.5, 
were adequately described by a linear estimating equation. The rela- 
tionship between the diameter and the volume of trees is not linear, as 
- may be seen in Chart 20.1, which presents the data of Table 20.1. As 
noted in the table, the volume figures represent one-tenth of the number 
of board feet of lumber in a tree. The 20 pairs of values are for ponderosa 
pine trees selected at random from a Tree Measurement Book from the 
Coconino National Forest in Arizona. 

POLYNOMIALS 

Second-degree curve. To describe the relationship between diam- 
eter and volume, we shall first employ an estimating equation of the type 

Y, = a + bX + cX^ 

and compare our results v/ith those obtained when using a straight line. 
After considering an e.stimating equation of the type 

Y,==a + bX + cX^ + dX\ 

for a different set of illustrative data, we shall return to the data of 
diameter and volume of ponderosa pine trees and examine several possible 
transformations of those data. 

For a second-degree curve, three normal equations are required. They 
are: 

I. SF = iVa -f bXX -f cXX^] 

11. SAT = -f + c2Jr*; 

III. SXT = aSX® 4- 6SX« -b cXXK 

486 
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VOLUME. BOARD 
FEET 4* to 



Chart 20.1. Diameter and Volume of Twenty Ponderosa Pine Trees and 
Second-Degree Estimating Equation, with Zones of ±1, ±2, and ±3 Standard 
Errors of Estimate. Data of Table 20.1. Estimating equation shown by solid line. 

Substituting the values obtained in Table 20.1, we have 

I. 2,460 - 20a + 569h + 17,437c; 

11. 83,777 = 569a + 17,4376 + 567,749c; 

III. 2,949,733 = 17,437a + 567,7496 + 19,361,917c. 

In order to get the values of a, b, and c, it is necessary to solve these 
three equations simultaneously. In describing one procedure for solving 
three simultaneous equations, we shall first state each step in general 
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TABLE 20.1 

Computation of Values Used for Determining Measures of Relationship 
Based on Straight-Line and on Second-Degree Curve for Diameter 
and Volume of Twenty Fonderosa Fine Trees 


Diam- 
eter at 
breast 
height 
(inches) 
X 

Vol- 

ume* 

(board 

feet 

10) 

Y 

XY 

X^Y 




72 

36 

192 

6,912 

248,832 

1,296 

46,656 

1,679,616 

36,864 

28 

113 

3,164 

88,592 

784 

21,952 

614,656 

12,769 

28 

88 

2,464 

68,992 

784 

21,962 

614,656 

7,744 

41 

294 

12,054 

494,214 

1,681 

68,921 

2,825,761 

86,436 

19 

28 

532 

10,108 

361 

6,859 

130,321 

784 

32 

123 

3,936 

125,952 

1,024 

32,768 

1,048,576 

16,120 

22 j 

51 

1,122 

24,684 

484 

10,648 

234,256 

2,601 

to»«#38 

v «252 

9,576 

363,888 

1,444 

54,872 

2,085,136 

63,504 

25 

56 

1,400 

35,000 

625 

15,625 

390,625 

3,136 

17 

16 

272 

4,624 

289 

4,913 

83,521 

256 

31 

141 

4,371 

135,501 

961 

29,791 

923,521 

19,881 

20 

32 

640 

12,800 

400 

8,000 

160,000 

1,024 

25 

86 

2,150 

53,750 

625 

15,625 

390,625 

7,396 

19 

21 

399 

7,581 

361 

6,859 

130,321 

441 

39 

231 

9,009 

351,351 

1,521 

59,319 

2,313,441 

53,361 

33 

187 

6,171 

203,643 

1,089 

35,937 

1,185,921 

34,969 

17 

22 

374 

6,358 

289 

4,913 

83,521 

484 

37 

205 

7,585 

280,645 

1,369 

60,653 

1,874,161 

42,025 

23 

57 

1,311 

30,153 

529 

12,167 

279,841 

3,240 

39 

265 

10,335 

403,065 

1,521 

59,319 

2,313,441 

70,225 

569 

2,460 

83,777 

2,949,733 

17,437 

567,749 

19,361,917 

p2,278 


* Volume was ascertained by means of the “ Scribner decimal C" rule, which is described in 0, Bruce 
and F. X. Schumacher, Forest Mensuration, McGraw-Hill Book Co., Inc., New York, 1942, pp, 159-'163. 

Data supplied by courtesy of the Forest Service of the United States Department of Agriculture. 
The figures are a random sample from a Tree Measurement Book from the Coconino National Forest 
in Arizona. 


terms and then indicate the specific operation for this problem. The 
steps ai‘e: 

1. Multiply normal equation I by such a number that the coefficient of 
one unknown will become the same as the coefficient of the same 
unknown in normal equation II. For our data, normal equation I 
is multipKed by SZ iV = 28.45 to yield 

(I X 28.45). 69,987 - 569a + 16,188.055 + 496,082.65c. 

2. Subtract modified equation I from II, ot vice versa, to yield Equa- 
tion A, which will contain two unknowns. For the present problem, 
Equation A will contain only h and c. 
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II. 83,777 = 569a -t- 17,4376 567,749c. 

(I X 28.45). 69,987 = 569a 16,188.056 -H 496,082.65c. 

A. 13,790 = 1,248.956 + 71,666.35c. 

3. Multiply normal equation II by such a number that the coefficient 
of the unknown which is not in Equation A will be the same in II 
as in normal equation III. In our problem, we multiply normal 
equation II by -f- SX = 30.644991, obtaining 

(II X 30.644991). 

2,567,345.411 = 17,437a + 534,356.7086 17,398,662.995c. 

4. Subtract modified equation II from III, or vice versa, to get Equa- 
tion B, which will contain the same two unknowns as Equation A. 
For our data, we have: 

III. 2,949,733 = 17,437a -f 567,7496 -f 19,361,917c 

(II X 30.644991). 

2,567,345.411 = 17,437a -f 534,356.7086 -f 17,398,662.995c 
B. 382,387.589 = 33,392.2926 •+ 1,963,254.005c 

5. Solve Equations A and B simultaneously (the procedure was 
described on pages 268-269) to obtain the values of the two con- 
stants in those equations. Doing this for the data of diameter and 
volume of the trees- gives: 

6 = -5.620315; 
c = -4-0.2903663. 

6. Substitute, in any one of the normal equations, the values computed 
in Step 5 in order to find the value of the unknown which was not in 
Equations A and B. Using I, we have 

2,460 = 20a + (569) (-5.6203 15) -f- (17,437)(0.2903663). 

20a = 594.842; 
a = 29.7421. 

7. As a check, substitute the values obtained in Steps 5 and 6 in a nor- 
mal equation not used in Step 6. Employing Equation II gives 

83,777 = (569) (29.7421) - (17,437) (-5.620315) + (567,749) (0.2903663), 

= 83,776.9987. 

The second-degree equation for estimating tree volume from diameter is 
Y, = 29.7 - 5.62X -h 0.2904X=, 
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This equation is shown on Chart 20.1 by a solid line. In view of the 
appearance of the scatter diagram and the estimating equation, the reader 
may be surprised that b has a negative sign. The reason is that Chart 
20.1 shows only part of the curve. If the chart were to be redrawn with 
a horizontal scale beginning at zero, the estimating equation would be 
seen to be roughly U-shaped. 

For a tree having a diameter of 30 inches, the estimated volume would 
be 

Yc = 29.7 - (5.G2)(30) + (0.2904) (30) 2 , 

= 122.1 tens of board feet. 


Total variation is computed by means ot the same expression that was 
used for linear correlation, 

22/2 = 272 _ fXY , 

= 462,278 - (123) (2,460) = 159,698. 

Since we have the values of a, b, and c, we can ascertain the explained 
variation, which is^ 

^yly.xx> = a27 + 52X7 + c'SX^~Y - F27, 

= (29.7421) (2,460) 4- (-5.620315)(83.777) 

+ (0.2903663) (2,949,733) - (123) (2,460), 

= 156,235.5. 

We may now obtain in the same manner as for linear cor- 

relation: 

’^vlr.xx^ ~ ^y^Y.xx'i 

= 159,698 - 156,235.5 = 3,462.5. 

The standard error of estimate is 


Sr.xx* 




2 

sT.XX^ 


N 


4 


3,462.5 


20 


1.3.2 tens of board feet. 


The zones of ±1, 2, and around the estimating equation, are 

shown in Chart 20.1 by broken lines. Estimates of volume, such as that 
made for a tree having a diameter of 30 inches, may be written ± 13.2. 
The coefficient of determination is, as before, the ratio of explained 


^ k a rather awkward subscript, but it indicates quite clearly that we are 

dealing with measures computed in relation to an estimating equation employing the 
first and second powers of the independent variable. 
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variation to total variation 

' F.XX® 




cT.XX^ 


2 ^- 


156,235.5 

159,698 


0.978. ^ 


The coefficient of correlation is the square root of this figure^ 


ry.xx^ = 0.989, 

but it has no sign. The reason for the lack of a sign is that, when an 
estimating equation is curvilinear, the relationship between the two 
variables may be positive in one portion of the equation but negative in 
another portion. 

Comparison of results with those obtained from the use of a straight line. 
From the appearance of Chart 20.1, it is quite clear that the relationship 
between the diameter and volume of the ponderosa pine trees is non- 
linear, and we shall see, in Chapter 26, that the correlation resulting from 
the use of the second-degree curve is significantly higher than that based 
upon a straight line. For the present, we are interested only in com- 
paring the results just obtained with those for a straight-line relationship. 
Using N and the appropriate summations from Table 20.1, the solution 
of the normal equations 

1. SF = Na + feSAT and 
II. 2XF - aSX + 5SX2 

gives 

a = —191.124274 and 
b = 11.041275. 


The straight-line estimating equation is 

F, = -191.1 + 11.04X. 

This equation is shown, by means of a solid line, on Chart 20.2, and it is 
clear that a straight line is not a satisfactory description of the relationship. 
Explained variation, from the straight line, is 

« aSF + &SZF - F2F, 

- (-191. 124274) (2,460) + (11,041275) (83,777) - (I23)(2,460), 

= 152,259.2. 

Total variation is 

2y^ = SF^ - F2F, 

= 462,278 - (123) (2,460) - 159,698, 
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VOLUME, 80ARD 
FEET T iO 



OtAMETER IN INCHES 


Chart 20.2. Diameter and Volume of Twenty Ponderosa Fine Trees and 
Straight-line Estimating Equation, with Zones of ±1, ±2, and ±3 Standard 
Errors of Estimate. Data of Table 20.1. Estimating equation shown by solid line. 
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a decidedly larger value than was obtained when the second-degree curve 
was used. The zones of ±1, 2, and 3sr are shown on Chart 20.2 by 
broken lines. 

As was to be expected ^ the linear coefficients of determination and 
correlation are smaller- than those based upon the second-degree curve. 
They are: 




M 


152,259.2 

159,698 


== 0.953, and 


r = +0.976. 


Third-degree curve. As an illustration of the third-degree curve, 
and, incidentally, also of the law of diminishing returns, we shall use data 
derived from experiments with nitrogen fertilizer and tobacco yield at 
Tifton, Georgia. One thousand pounds of fertilizer per acre were applied 
to five different plots. Of the active ingredients, phosphoric acid and 
potash were held constant at 8 per cent and 5 per cent, respectively; and 
the nitrogen was made to vary as follows: none, 2 per cent, 3 per cent, 
4 per cent, 5 per cent. Presumably the experiment was so conducted that 
differences in yield were not attributable to differences in soil fertility, 
drainage, and so forth, between plots. The experiment was repeated in 
three different years. Of the total variation, what proportion can be 
explained by the varying amount of nitrogen used? While it is possible 
that the experiment was not perfectly designed, the data indicate almost 
perfect correlation when the relationship is assumed to be of the type 


Fc - a + 5X + + dXK 


* It is possible to set up a measure 

ji _ 

^rxKx *- 


^Ver.XXi 


which expresses (1) the increase in explained variation, attributable to the use of X’*, 
as a ratio of (2) the amount of variation unexplained by using X alone. Dividing 
the numerator and denominator of the above expression by allows us to write 


^yxkx 


4.xx« "" 

1 - r* 


This measure is strictly analogous to the coefficient of partial determination, dis- 
cussed in the next chapter. It will be referred to again in Chapter 26 when we under- 
take to ascertain whether the non-linear coefficient of determination is significantly 
larger than the linear coefficient, > 
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This can be roughly verified by inspection of the scatter diagram, Chart 
20.3. The heavy horizontal lines are the average yields for each of the 
percentages of nitrogen which are given. These means are not necessary 
for the solution of the problem, but are useful in discovering the type of 
curve to fit. 


YIEUO 
IN POUNDS 



Cliart 20.3. Per Cent Nitrogen in Fertilizer and 
Yield Per Acre of Tobacco, at Tifton, (Georgia. 

Data of Table 20.2. The horizontal lines show the aver- 
age yield per acre for each percentage of nitrogen, while 
the curve represents values computed from the tiiird- 
degree equation. 

Solution of normal equations. Since four constants must be found, four 
normal equations of the following type must be used:® 

I. SF « i\ra + feSX + + dS J®; 

IL SXF « aSZ + 6SX2 + cSX® + dSX^; 

III. SX^F == aSX2 + &SX® + cSX^ + dSX^; 

IV. SX®F « aSX® + 6SX^ + cSX® + d2Xl 


® Had three observations been taken for 1 per cent nitrogen, the origin could con- 
veniently have been taken at the mean of the X values (2.5). Then the sum of the 
odd powers of X would have been zero, and would have disappeared from the normal 
equations. We should then have had two pairs of normal equations to solve simul- 
taneously: 

L 2F » Va + c^XH 
IIL SX®F « aSX* + cZXK 


II 2XF « 

IV. SX®F « hXX^ + dSX®, 
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The values required are computed in Table 20.2, and their substitutions 
result in the following normal equations: 

I. 16,934 = 15a -f 426 + 162c + 672d: 

II. 50,630 == 42a + 1626 + 672c + 2,93,4d; 

III. 197,198 = 162a -f 6726 -f- 2,934c + 13,272d; 

IV. 822,884 = 672o + 2,9346 + 13,272c 61,542d. 

Following our previous procedure, we may solve together Equations I and 
II ; II and III ; III and IV, in each case eliminating o. This gives three 
equations: 

A. 48,222 = 6666 3,276c •+• 15,786d; 

B. 80,256 = 1,9806 -f 14,364c -h 82,1 16cf; 

C. 790,152 = 23,7246 + 178,416c -f 1,051, 020d. 

We may now solve together A and B and then B and C, eliminating 6. 
The equations are thus reduced to two: 

D. - 42,029,064 = 3,079,944c -j- 23,432,976d; 

E. -339,492,384 = 12,492,144c + 132,899,616d. 

Solving Equations D and E simultaneously, we find that 


and 


d = -4.4648847 
c = 20.323899. 


By substituting these values in Equation A, B, or C, we find that 

6 = 78.263630. 


Substituting the values found for 6, c, and d in Equation I, II, III, or IV, 
we find 

o = 890.32389. 


It is advisable to check the values of d, c, 6, and o at each step, since any 
error made in the early stages will vitiate all subsequent computations. 
One method of checking is to calculate each of the constants twice, by 
substituting in two different equations. Possibly even better is to sub- 
stitute all of the constants known at any time in one of the remaining 
equations. For instance, if the value of o has been found by substituting 
values of 6, c, and d in Equation I, a final check may be made by substi- 
tuting a, 6, c, and d in Equation IV. Thus, 

822,884 = 672(890.32389) ■+ 2,934(78.263630) •+■ 13,272(20.323899) 

-t- 61, 542( -4.4648847) 

= 598,297.65 + 229,625.49 -]- 269,738.79 - 274,777.93 
= 822,884.00. 



TABLE 20.2 

of Fofw ©3 Required to Ohtotn I^oasures of Relutiortship Retween Per Cent Nitrogen in FettHiser anti Yield 

per Acre of Tobacco, Tifton^ Georgia 

(Fertilizer tg l.OCH) potinds per acre; PrOi and KsO are 8 per cent and 5 per cent, respectively. The yields on all plots were unusually high in 1925; conse- 
Qosntly, they were r^uced by a factor which reduced their average to the average of 1924 and 1920.) 
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The estimating equation, then is 

Yc = 890.32 -+- 78.264X + 20.324X® - 4.4649XL 
Using this equation, the Ye values may be computed as follows: 


X 

a +6X 

cX2 

dX^ 

Yc 

(pounds) 

0 

890.32 

0 

0 

890.3 

1 

968.58 

20.32 

- 4.46 

984.4 

2 

1,046.85 

81.30 

- 35.72 

1,092.4 

3 

1,125.11 

182.92 

-120.56 

1,187.5 

4 

1,203.37 

325.18 

-285.76 

1,242.8 

5 

! 1,281.64 

1 508.10 

-558.12 

1,231.6 


If we omit the Ye value for X = 1 (since there is no observation for 
X = 1), sum the other Ye values, and multiply the result by 3 (since there 
were three observations for each X value) we obtain 16,933.8 pounds, 
which is in agreement with the SF value of Table 20.2. 

As can be seen from Chart 20.3, there is a point of inflection at about 
1| per cent nitrogen, and the curve reaches a maximum of nearly 1,250 
pounds shortly after the nitrogen reaches 4 per cent. These are, respec- 
tively, the points of diminishing marginal returns and diminishing total 
returns. How to locate these points more exactly is explained in Appen- 
dix S, section ^.1. 

Correlation coefficient and standard error of estimate. To compute 
»‘r.xx*x« aiid Sr.xx*x*. we need Sy„V.xx‘x>> and These are: 

^ylr.xx^x^ = aSF -f 6SXF -f cSX^F + d2X^Y - FSF, 

= (890.32389) (16,934) + (78.263630) (50,630) 

+ (20.323899) (197, 198) ■+ (-4.4648847) (822,884) 

- (1,128.93333) (16,934), 

= 255,624. 

= SF* - FSF, 

= 19,377,528 - (1,128.93333)(16,934), 

= 260,171. 

^ytr.xx’x' ~ ^y^ ~ ^VcY.xx^xh 

= 260,171 - 255,624 = 4,547. 

From these we obtain 

2 _ "^ylr.xx^x* 

?-r.xx»x» - ’ 


255,624 

260,171 


0.983. 


= 0.991. 
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Doolittle method. It must be confessed that, when there are as many 
as four equations to solve simultaneously, the above procedure is some- 
what laborious. Furthermore, no check can be applied until the value 
of d is obtained. Even that does not check the accuracy of any work 
except the solution of the two equations (D and E) necessary to obtain 
c and d. All of the preceding work could have been honeycombed with 
errors and still the solution of these two equations would check. It is 
not until ail of the constants are obtained that we have any real check on 
the accuracy of the solution of the four normal equations. If the final 
check fails, all of the work must be repeated. 

Fortunately there is available for solving equations of this type simul- 
taneously a systematic method that provides frequent checks on accu- 
racy and is less laborious than the above procedure when there are four 
or more equations. It is known as the Doolittle method, having been 
developed by M. H. Doolittle. Like many labor-saving devices in 
statistics, the method at first seems very confusing. To a certain 
extent there is a substitution of complexity of procedure for repetitive 
drudgery. 

The Doolittle method is illustrated by Table 20.3. There are five 
parts to this table; 

Part 1. Normal equations. These are the same equations that are 
found on page 495, but all of the terms have been put on the left side, so 
that each equation equals zero. 

Part 2. Forward solution. This solution obtains a value for d 
(-4.4648919, found in row IV', column 0), and provides the figures 
with which to obtain values for the other constants. 

Part 3. Back solution. In this part we compute by a simple process 
the values, in turn, for c, 6, and a. 

Part 4. Estimating equation. Note that this equation agrees, to five 
digits, with the one previously obtained. 

Part 5. Check equation. By substituting the values of the con- 
stants obtained in the last normal equation, the preceding work is 
checked. This step involves nothing new. 

The entries in the forward solution are the most confusing, but if the 
procedure and explanation outlined below are followed very carefully, 
no trouble will be experienced in applying the Doolittle method to the 
solution of equations of this type. It is desirable that work be done in 
pencil first. This will permit some of the entries to be made in boldface, 
as indicated in Table 20.3, merely by converting the pencil figures into 
ink. The steps in the forward solution are as follows: 

1. Divide the forward soltdion table into as many sections as there are 
normal equations. Leave a space between sections, and separate also by 
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a horizontal line as shown. Allow in each section two more rows than 
the section number: except that section one requires only two rows, 
rather than three. 

2. Label the columns: (1), (2), (3), (4), 0, and Check total Five con- 
stants would require five normal equations, and therefore a column (5 
also. Enter also the descriptive matter in the stub; as shown in Table 
20.3. 

3. Record the appropriate normal-equation coefficients in the first 
row of each section, being sure to indicate minus signs. 

4. Total each normal equation algebraically; record the results in the 
last column. 

5. Make the following entries in the last row of each section: 

1.00000000 in row F column (1); 

1.0000000 in row IF column (2); 

1.000000 in row IIF column (3); 

1.00000 in row IV' column (4). 

The number of zeros after 1. indicates the minimum number of decimal 
places to carry computations in each section. The reason for dropping 
an additional decimal place as computations proceed from section to 
section is that errors from rounding the figures cumulate, and the num- 
ber of significant places becomes smaller. It is advisable, however, 
never to record fewer than eight digits, including the decimal places. 

6. Row F is the result of dividing row I by the number in cell SI(1) 
and changing signs. The sum of the first five entries in this row should 
be checked against the entry in the total column, and agreement indi- 
cated by a check mark. Values in columns (2), (3), (4), and 0 of this 
row should be entered in boldface, as further use is to be made of them. 

(As suggested above, this is most easily done by reinforcing the original 
pencil entries with ink.) 

7. The entries in the second row of section II, which is labeled SI X 
r(2), are a result of multiplying the items in row SI by the number (in 
boldface) in the cell which is an intersection of row I' and column (2). 

In similar fashion, immediately below each row of normal-equation 
coefficients are found the corresponding ‘^product rows. These rows 
are called product rows because they are the result of making multiplica- 
tions, a description of each such operation being given in the stub of the 
table. It helps to keep the process straight if we observe that the multi- 
pliers are always the boldface numbers in the column bearing the same 
parenthesized number as the section being computed; and that the 
numbers multiplied are those in the row immediately above the boldface 
number in question. A cheek on the accuracy of these entries is afforded 
by totaling each row as it is computed, and indicating by a check mark 
agreement with the entry in the total column. 

8. The third row of section II, labeled SII, is the result of adding 
algebraically the two rows above it in that section. Likewise the S I’ow 
in each section is a vertical summation of all the entries above the S row 
in the section in question. There is no separate S row in section I, since 
the section has no product row, and therefore the normal-equation row 
automatically becomes also the X row. Note that, as the computationB 
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proceed from section to section, there is an increase in the number of 
spaces in this row that are left vacant because the entries have become 
zero. These S rows also should be added horizontally to obtain a check 
with the total column. 

9. Bow II' is the/esult of dividing row SII by the value in 2)11(2) and 
changing signs. So also each “prime” row (III', IV', and so forth) is 
obtained by dividing each item in a given 2) row by the first entry in that 
row, with sign changed. It is because of this fact that the first entry is 
always — L This entry is perhaps a sufficient description to remind us 
of the nature of the operation. The prime rows should also check with 
the total column. After the check has been made, enter the numbers to 
the right of each — 1 in ink, up to, but not including, the total column 
entry. 

The preceding explanation has referred specifically to the steps in- 
volved in sections I and II. The other sections are computed in similar 
fashion, each section requiring the previous computation of the other 
sections. The only variation among the different sections lies in the 
number of product rows and the number of vacant spaces to the left in 
some of the rows. As previously noted, we have obtained (in cell IV' 12) 
the value of d, which is —4.4648919. We are now ready to proceed 
with the back solution to obtain a, and c. 

The hack solution occasions no difficulty. It consists merely in sub- 
stituting the values of the constants, as obtained, in the derived equa- 
tions III', II', and I'. The entries in the 1 column are the boldface 
items in Column 12 of the forward solution table. The item in the last 
row of this column (—4.4648919) is d. This value is recorded in the last 
row of the total column. The entries in the d column are the boldface 
items of Column (4), above, multiplied by —4.4648919 (the value of d). 
The sum of the items in the third row is c (33.970002 — 13.646047 == 
20.323955), which is entered in the total column, opposite c. The entries 
in the c column are the boldface items of Column (3), above, multiplied 
by c. The sum of the items in the second row is 5. The entry in the 
h column is the boldface entry in Column (2), above, multiplied by 5. 
The sum of the items in the first row is a. It will be noticed that, in 
using the back solution table, we record the column to the right first and 
then proceed to the left: and in the total column we proceed from bot- 
tom to top. Proceeding in this fashion is rather unusual, but most con- 
venient in this case. 

The estimating equation arrived at by the Doolittle method, 

, Yc - 890.32 + 78.264X + 20.324X2 - 4.4649X® , 

agrees with the equation previously obtained on page 497. 

In the right-hand column of the Doolittle back solution table is pro- 
vided a convenient place for computation of the explained sum of squares 
by the expression 

^ + hJ^XY + cSX^F + d2)X«F. 

Kote also that 2)F, 2)XF, 2)X2F, and SX^F (with signs changed) are 
found in Column 0 of the forward solution table, the first row of each 
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section, in that order from top to bottom; while a, 6, c, and d are ar- 
ranged, in corresponding order, in the left-hand part of the back solu- 
tion table. The computations show that 

= 19,372,982, and 

^VcY.XX^X^ ~ ^^Iyxx^x^ “■ 

= 19,372,982 - (1,128.93333) (16,934), 

- 265,625. 

USE OF TRANSFORMATIONS 

Instead of using a second-degree curve, or a curve of higher order, as an 
estimating equation, we may convert the readings for one or both variables 
into a different form. The most frequently used transformations involve 
logarithms, reciprocals, roots or powers, and logarithms of logarithms. 
Frequently, a transformation will show a linear relationship between the 
two converted series. We shall consider the use of logarithms, roots, and 
reciprocals for the data of diameter and volume of ponderosa pine trees 
which were used earlier in this chapter. First we shall examine the trans- 
formations graphically. Correlation analysis of the data will then be 
made for the transformations that appear most appropriate. The other 
transformations will be dealt with in symbolic terms only. 

Preliminary examination. Based upon our experience with the 
semi-logarithmic chart in Chapter 5, it seems reasonable to think that the 
scatter diagram of Chart 20.1 might straighten out if we were to use a 
grid with a logarithmic vertical scale. In this event, we would use an 
estimating equation of the type^ 

(log Y)c = log a + X log 6. 

Such a scatter diagram is shown in Chart 20.4, and it is clear that the 
relationship between log F and X is not linear. 

In Chart 20.5, the same data have been plotted on a grid having both 
vertical and horizontal logarithmic scales. This transformation calls for 
the use of an estimating equation of the type 

(IogF)e-loga + &logX 

^ The symbol (log F)« is used, rather than log F*, to make clear that we are dealing 
with *Hhe computed value of log F,” not *Hhe logarithm of the computed value of 
For parallel reasons, use is made in the following paragraphs of ('\/F)c rather 

than rather than 
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diameter in inches 

Chart 20*4. Diameter and Volume of Twenty Ponderosa Pine Trees Plotted 
on a Semi«logaritli.mic Grid. Data of Table 20.4. 


The scatter diagram of Chart 20.5 indicates that the relationship between 
log Y and log X is virtually linear.® 

Another transformation is possibly more logical than either of the two 
already tried. .Since the volume of a cylinder is directly related to its 
length and to the square of the radius (or diameter) of its circular cross 
sectioHj it would seem reasonable to try a transformation involving V F 

® OocasionaEy an estimating equation of the type 

Yc - a i-h log X 

is appropriate. , For an illustration, see F. E. Croxton, Elementary Btatwtice With 
AfpUcatiom in Medicine^ Prentice-HaB Inc., New York, 1953, pp. 162-167. 
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DJAMETER IN INCHES 


Chart 20.5* Diameter and Volume of Twenty Pouderosa Pine 
Trees and Estimating Equation of Type (Log V)® =* log o + b log X, 
with Zones of ±1, ±2, and ±3 Standard Errors of Estimate, Shown 
on a Logarithmic Grid. Data of Table 20.4. Estimating equation 
shown by solid line. 

and X. Of course, a tree is not a cylinder,^ but Chart 20.6 shows a scatter 
diagram which appears to be more nearly linear than the preceding one. 
For this relationship, the estimating equation would be of the type^ 

(VF). = a + bX. 

Although it is not reasonable to expect that i and X will produce s 

linear scatter diagram for these data, Chart 20.7 has, nevertheless, been 
prepared. It is clear that this relationship is not suitable for these data, 
although it is sometimes useful for other series. The estimating equation 

® See page 234 of the second edition (1942) of the reference mentioned below Table 

20 . 1 . 

^ See note 4. 
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SQUARE ROOT OF 
(VOLUME 4 IO> 



diameter in inches 

Chart 20,6. Diameter and Square Root of Volume of Twenty Fonderosa 
Pine Trees and Estimating Equation of Type (a/Y)^ =« a + hK^ with Zones 
of ±1, ±2, and ±3 Standard Errors of Estimate^ Shown on an Arithmetic 
Grid. Data of Table 20.5. Estimating equation shown by solid line. A square 
root vertical scale could have been used for this chart. A grid using a square root 
vertical scale and an arithmetic horizontal scale was not used here since paper ruled 
in this manner is not readily available to the reader. The equaEy spaced vertical 
scale values could be 0, 1, 4, 9, 16, 26, and so on. 

would be of the type^ 

(?),-“+“• 

The reader may have noticed that the grids used for Charts 20.4 and 
20.5 were so designed that the actual X values and F values were plotted. 

^ See note 4.‘ 
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Charts 20.6 and 20.7 did not employ special grids, but used arithmetic 

scales, and the "n/f and -p values were plotted against the X values. 

Special grids could have been used for Charts 20.6 and 20.7 ; they were not 
used because they are not readily available to the' reader. 


RECIPROCAL OF 

(VOLUME -f 10) 



Chart 20.7* Diameter and Reciprocal of Volume of Twenty Fonderosa Pine 
Trees, Shown on an Arithmetic Grid, Data from Table 20.1, which does not 
show the reciprocals of the Y values. 


We shall now proceed to compute the various correlation measures for 

the log F, log X relationship and for the Vf, X relationship. The 

log F, X relationship and the — ? X relationship will be considered in 

» 

terms of symbols only. Because each of the four equation types which 
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are involved calls for but two unknowns in the estimating equation, all 
procedures will parallel those for linear correlation of ungrouped data as 
described in Chapter 19. The formulas will be the same as those pre- 
viously used, except that, (1) log Y, '\/y, or — will be substituted for Y, 

and (2) log X will be substituted for X when we use the log Y, log X 
relationship. 

Since the four transformations which will be considered involve the 
logarithms, square roots, or reciprocals of the Y values, two points should 
be borne in mind: (1) the least-squares fit does not minimize the sum of 
the squares of the Y — Y, values; it minimizes the sum of the squares 
of the deviations of the transformed observed Y values from the computed 
transformed Y values; and (2) when stating the amount of dispersion of the 
actual F values from the estimating equation, the standard error of esti- 
mate must be added to and subtracted from the computed F values when 
both are in terms of transformed units; after the addition and subtraction, 
the results may be re-converted to units of the original F series. 

The log F, log X relationship. Chart 20.5 indicated that the 
relationship between diameter and volume was nearly linear when both 
'series were expressed in terms of logarithms. The estimating equation 
is of the type 

(log F)c = log o -f 5 log X, 

and the constants log a and h are obtained by solving simultaneously the 
normal equations 

I. S log F = JV log a + 5 S log X; 

11. S(log X • log F) = log a S log X -f- 6 S(log X)*. 

Substituting, in these equations, the values from Table 20.4 (loga- 
rithms are in Appendix R) gives 

I. 38.727389 = 20 log a -f 28.7280126; 

II. 56.619891 = 28.728012 log a -f- 41.5811456. 

Simultaneous solution yields 

log a —2.569125 and 
h ^ 3.136656. 

The estimating equation may now be written 

(log F). = -2.569125 -f 3.136656 log X. 

Since the estimating equation which we are using is the linear form of 

F« «=* oX‘, 
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the estimating equation, in terms of the original data, is 

Yc = 0 . 002697 Z *- 136666 

(Note that log a = -2.569125 = 7.430875 - 10, and its antilog is 

TABLE 20.4 ' 

Computation of Values Used for Determining Measures of Relationship 
Between Logarithm of Diameter and Logarithm of Volume of 
Twenty Ponderosa Pine Trees 


(Logarithms are obtained from Appendix R.) 


Diameter 
at breast 
height 
(inches) 
X 

Vol- 
ume* 
(board 
feet 
- 10) 

Y 

logX 

logF 

log X ■ log F 

aog X)’ 

aog yy 

36 

192 

1.556303 

2.283301 

3.553508 

2.422079 

5.213463 

28 

113 

1.447158 

2,053078 

2.971128 

2.094266 

4.215129 

28 

88 i 

1.447158 

1.944483 

2.813974 

2.094266 

3.781014 

41 

294 

1.612784 

2.468347 

3.980911 

2.601072 

6.092737 

19 

28 

1.278754 

1.447158 

1.860559 

1.635212 

2.094266 

32 

123 

1.505150 

2.089905 ; 

3.145621 1 

2.265477 

4.367703 

22 

51 

1.342423 

1.707570 

2.292281 i 

1.802100 

2.915795 

38 

252 

1.579784 

2.401401 

3.793695 

2.495717 

5.766727 

25 

56 

1.397940 

1.748188 

2.443862 

1.954236 

^ 3.056161 

17 j 

16 

1.230449 

1.204120 

1.481608 

1.514005 

i 1.449906 

31 

141 

1.491362 

2.149219 

3.205264 

2.224161 

4.619142 

20 

32 

1.301030 

1.505150 

1.958245 

1.692679 

2.265477 

25 

86 

1.397940 

1.934498 

2,704312 

1.954236 

1 3.742283 

19 

21 

1.278754 

1.322219 

1.690793 

1.635212 

1 1.748263 

39 

231 

1.591065 

2.363612 

3.760660 

2.531488 

1 5.586662 

33 

187 

1.518514 

2.271842 

3.449824 

2.305885 

5.161266 

17 

22 

1.230449 

1.342423 

1.651783 

1 1.514005 

1 1.802100 

37 

205 

1.568202 

2.311754 

3.625297 

2.459258 

5,344207 

23 

57 

1.361728 

1.755875 

; 2.391024 

1.854303 

3.083097 

39 

265 

1.591065 

2.423246 

1 3.855542 

2.531488 

5.872121 

569 

2,460 

28.728012 

38.727389 

1 56.619891 

41.581145 

1 78.177518 


* See note to Table 20. L 

For source of data, see Table 20,1, 


0.002697.) The estimating equation is shown on Chart 20.5, which has 
logarithmic scales, and on Chart 20.8, which has arithmetic scales. 

Total variation is® 

S(log yy = S(log Yy - (toiT)S log F, 

’Note that S(log yY » 2 [log Y - (joiT)]’ = S (log F - ■ ^ )°. Itiswi 

SBog (F - F)]3. Simaarly, Sdog y)l = SlOog F), - d^)P and SOog y)t = 
S[log F - aog F)d*. 
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VOLUME. BOARD 
FEET -r »0 



Ciiart 20.8. Diameter and Volume of Twenty Ponderosa Pine Trees and 
Estimating Equation of Type (Log y)c = log a -f b log X with Zones of ±1, 
±2, and ±3 Standard Errors of Estimate, Shown on an Arithmetic Grid. 
Data of Table 20.4. Estimating equation shown by solid line. 

^ ^ ^ SlogF 38.727389 . , 

where log Y = — — — = = 1.93636945. The numerical 

value for total variation is 

Saog VY = 78.177518 - (1.93636945) (38.727389), 

= 3.186985. 

Explained variation is^ 


“ If we were computing S(log y)l and S^og y)l from both Qog Y)c =• log a + 
b log X and flog F). log a + r log 6, we would probably wish to distinguish, by 
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S(log y)l = log a S log F -f & S(log Z • log F) - (log F)S log F, 

= (-2.569125) (38.727389) + (3. 136656) (56.619891) • 

- (1.93636945) (38.727389), 

= 3.111085. 

Unexplained variation may now be obtained by subtraction: 

S(log y)l = S(log yY - S(log y)l 

= 3.186985 - 3.111085 == 0.075900. 


The coefficients of determination and correlation are 


2 ^ S(log y)l ^ 3.111085 

no,r..o«3r 2(log2/)2 3.186985 

riogF.iogx == +0.988. 


0.976 and 


We may show a sign for the correlation coefficient, because the relation- 
ship between log Y and log X is linear. 

Since only two constants are involved in the estimating equation, we 
may compute the coefficient of correlation by using the modified product- 
moment formula. It will be recalled that this expression allows us to 
obtain the correlation coefficient without first ascertaining the constants 
in the estimating equation. For log Y and log X, 


flog 7.log X 

W2(log X > log Y) - (S log X)(2 log 7) 

V'[iVS(log X)* - (S log X)*][iVS(log F)2 - (2 log Y)^] 

20(56.619891) - (28.728012) (38.727389) 

~ V'[20(41.581145) - (28.728012)2][20(78.177518) - (38.727389) 

= -1-0.988. 

The standard error of estimate is 

/S(log“^ /0;075900 

SiogF.iogx — = 'y = 0.061604. 

The zones of ±1, 2, and 3 standard errors of estimate are shown on 
Charts 20.5 and 20.8. Note that, on Chart 20.8, the zones of scatter 
depart more and more from the estimating equation as the value of X 
increases. On Chart 20.6, the zones are always equidistant because the 
scales are logarithmic. 


means of symbols or otherwise, between the two methods of obtaining explained 
variation and nnexplained variation. 
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It may be well to illustrate the computation of one value and to 
show how the standard error of estimate is employed. To ascertain the 
value of (log F)o when X = 30 (for which log X = 1.477121), we write 

(log F)c - -2.569125 + (3.136656)(1.477121), 

= 2.064095. 

The antilog of this is 115.9, so that Ye = 115.9 tens of board feet. To 
obtain the limits of ± one standard error of estimate, we write 

antilog [(log Yh ± s.„*y..„.x] = antilog (2.064095 ± 0.061604), 

= antilog 2.002491 and 2.125699, 

= 100.6 and 133.6 tens of board feet. 

For the limits of ± two standard errors of estimate, we compute 

antilog [(log F)c ± 2sio,y.i„,x] = antilog (2.064095 ± 0.123208), 

= 87.3 and 153.9 tens of board feet. 

For the limits of + three standard errors of estimate: 

antilog [(log F)e ± 3sio,y.iogx] = antilog (2.064095 ± 0.184812), 

= 75.7 and 177.4 tens of board feet. 

In a similar manner, limits may be obtained for estimates of volume 
based upon other values of X. It must be remembered, of course, that 
the (log F)c value and the siogy.iotx value must be combined before 
antilogs are looked up in the table. Alternatively, the standard error of 
estimate may be applied to the Ye values in the form of a ratio. For 
example, 

antilog Siogy.kex = antilog 0.061604 = 1.1524 and 
antilog — Slot Y. lot X = antilog —0.061604 = antilog 9.938396 — 10, 

== 0.8678. 

Any Ye values computed from our estimating equation may now be multi- 
plied by these ratios to obtain the limits of + one standard error of 
estimate. For the case where X == 30 and Ye = 115.9, we get 

115.9 X 1.1524 = 133.6 and 

115.9 X 0.8678 = 100.6 tens of board feet, 

the same values that were obtained before. For limits of ± two or three 
standard errors of estimate, the procedure is the same, except that the 
initial step involves multiplying si„»r.i»gx by 2 or 3, or the ratios just 
obtained may be squared and cubed. 



Chap. 20] TWO-VARIABLE NON-LINEAR CORRELATION 513 


Tlie a/F, X relationship. Because the scatter diagram of Chart 
20.6 appears to be more nearly linear than does that of Chart 20.5, we 
should expect to obtain a higher coefficient of determination or correlation 
for the Vf, X relationship than for the log F, log X relationship. How- 
ever, the coefficients which we are about tp compute cannot be much 

TABLE 20.5 

Computation of Values Used for Determining Measures of Relationship 
Between Diameter and Square Root of Volume of Twenty Pon€lerosa 

Pine Trees 


(Square roots may be obtained from Appendix Q.) 


Diameter 
at breast 
height 

Volume* 
(board feet 
10) 

Vy 

X-\/Y 


(inches) 




X 

Y 




36 

192 

13.86 

498.96 

1,296 

28 

113 

10.63 

297.64 

784 

28 

88 

9.38 ! 

262.64 

784 

41 

294 

17,15 

703.15 ^ 

1,681 

19 

28 

5.29 

100.51 

361 

32 

123 

11.09 

354.88 

1,024 

22 

51 

7.14 

157.08 i 

484 

38 

252 

15.87 

603.06 

1,444 

25 

56 

7.48 

187.00 

625 

17 

16 

4.00 

68.00 

289 

31 

141 

11.87 

367.97 

961 

20 

32 

5.66 

113.20 

400 

25 

86 

9.27 

1 231.75 

625 

19 

21 

4.58 

87.02 

361 

39 

231 

15.20 

592.80 

1,521 

33 

187 

13.67 

451.11 

1 1,089 

17 

1 22 

4.69 

79.73 

289 

37 

205 

14.32 

529.84 

1 1,369 

23 

57 

7.55 

173.65 

1 529 

39 

265 

16.28 

634.92 

1,521 

569 

2,460 

204.98 

6,494.91 

17,437 


* See note to Table 20.1. 

For source of data, see Table 20.1. 


higher than those just obtained, since we found rf^gr.iocx ™ 0.976 and 
^ ’log Y.iog X = +0.988. 

The estimating equation is of the type 

(VF). = a + bX 

and the normal equations are 

I. S VF = iVffi + 6SX; 

II. SX Vf = aSX + b'2X\ 

Substituting values from Table 20.5 (squares and square roots are given 
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in Appendix Q), we have 

I. 204.98 = 200 + 5696; and 
II. 6,494.91 = 569o + 17,4376; 
which, when solved simultaneously, give 

a = -4.8587836 and 
6 = 0.5310293. 

The estimating equation, then, is 

(Vf)^ = -4.86 4- 0.531Z, 

which is shown on Chart 20.6, where V " Y values and X values are plotted, 
and on Chart 20.9, on which the F and X values appear. 

Total variation is computed from^° 

2 (\/y )2 = S(a/f)=- WsVf = sf- Vyi^Vy, 

, - 7 = sVf 204.98 ... 

where V F — — = 10.249. Total variation is 

'S(Vyy = 2,460 - (10.249) (204.98) = 359.1600. 

Explained variation is 

S(V^)c = aS Vy + b'ExVf - -V^ S Vf, 

= (-4.8587836) (204.98) + (0.5310293) (6,494.91) 

- (10.249) (204.98), 

= 352.1940. 

Unexplained variation is 

s(vy)“ = - s(v^)=, 

= 359.1600 - 352.1940 = 6.9660. 


The coefficient of determination is obtained from 


352.1940 

359.1600 


0.981. 


' Note that 2(Vy)* “ 2(-\/7 


(Vy 


s Vy' 

N 


It Is not 



SimUarly, S(Vy)* = 2[(-\/F). - VF]®and ScVs'lf = 
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This value is slightly larger than that obtained from use of the second- 
degree equation — 0.978) and also larger than when the logarith- 

mic estimating equation {rl^r.\otx = 0.976) was employed. The coeffi- 


VOLUME. BOARD 
FEET T JO 



OJAMETER IN INCHES 


Claart 20.9* jDiameter and Volnjme of Twenty Ponderosa Fine Trees and 
Estimating Eqnation of Type ('\/Y)c =» a 4* with Slones of ±1, ±2, and 
Standard Errors of Estimate, Shown on an Arithmetic Grid^ Data of 
Table 20.5. Estimating equation shown by solid line. 

cient of correlation is the square root of the coefficient of detennination, 

^ 4*0.990; 

OTj if a and b have not been computed, it may be ascertained from 
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_ N^xVy - (SA)(S Vf) 

Vr.x ‘-v/[Ar2X2 - (SX)=][A''SF - (S\/f)=]’ 

20(6,494.91) - (569) (204.98) 

~ Vi20(17,437) - (569’)^[20[2,460) - (204.9^ 
= +0.990. 


The standard error of estimate is 


/ v 


Vr.A' 




6 .9660 

20 


0.590. 


The zones of ±1,2, and 3 standard errors of estimate appear on Charts 
20.6 and 20.9. As in the case of the logarithmic relationship, the zones 
become wider, in absolute terms, as X increases. This may be seen in 
Chart 20.9. On Chart 20.6 the zones are equidistant because 
values were plotted. 

When X = 30, the value of Yc is obtained as follows; 

{Vy)c = -4.86 + (0.531) (30) = 11.07. 

Since (vV)c = 11.07, IT- = (11.07)^ = 122.5 tens of board feet. To get 
the limits of ± one standard error of estimate, w^e compute 

[(V Y)c ± s^/y = (11.07 ± 0.59)^ = 109.8 and 136. 0 tens of board feet. 

The limits of ± two standard errors of estimate are computed from ' 

[(VV). ± = [11.07 ± 2(0.59)]^ 

= 97.8 and 150.1 tens of board feet. 

For the limits of ± three standard errors of estimate, 

[(VV). ± 3s^ = [11.07 ± 3(0.59)]^ 

= 86.5 and 104.9 tens of board feet. 

In a similar manner, limits may be computed for other estimates of 
%mlume. It is important to remember that the (V'F)^ and the 
values must be combined before the squares are obtained. 

Comparison of the three non-linear relationships for diameter 
and volniiie of trees. Although it is clear that any one of the three 
nondinear estimating equations is preferable to the linear equation for 
describing the correlation between the diameter and volume of ponderosa 
pine trees, it is not at all obvious which one of the three non4ineai'*equa- 
tions is superior, since all of them give coefBcients of determination which 
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differ only in the third decimal place. All round to 0.98. It i^ rather 
unusual to find that several equation types give coefficients so nearly 
alike that there is little room for choice between them. However, it must 
be remembered that, in one sense, the coefficients are not strictly com- 
parable. The second-degree curve explained 97.8 per cent = 

0 978) of the variation in the Y values. The logarithmic estimating 
equation explained 97.6 per cent = 0.976) of the variation in 

the logarithms of the Y values. The estimating equation using ’^/y and 
X explained 98.1 per cent == 0.981) of the variation in the square 


roots of the Y values. 

The three standard errors of estimate cannot be compared with each 
other, since they are in different units. For the second-degree curve, the 
standard error of estimate is always 13.2 board feet 10. When the 
logarithmic estimating equation is used, the standard error of estimate is 
always 15.2 per cent of the estimate in a positive direction or 13.2 per 
cent of the estimate in a negative direction. As pointed out in Chapter 
19, the standard error of estimate is an over-all measure of the dispersion 
of actual values from estimated values, which is nevertheless applied to 
specific estimates. Table 20.6 shows estimates of volume of Poiiderosa 
pine trees made by each of the three non-linear methods and the 
amount of error represented by one standard error of estimate in each 
direction, when X ~ 18, 30, and 40. Estimates made by the second- 


degree curve and by the a/ F, X relationship are not much different; all 
three equations give about the same estimate of volume when X = 18, 
In absolute terms, the error is constant whether X is large or small, when 
the second-degree equation is used ; for either of the other two equation 
types, the error becomes greater as X increases. For small values of X, 
the logarithmic relationship shows the smallest errors; while for large 
values of X, the second-degree curve shows the smallest errors. The 
VT, X relationship is generally intermediate between these two, 

One criterion that has been suggested for comparing the suitability of 
different equation types consists of computing a Fc value for each 


observed value of X and calculating 


4 


2(7 - 
N 


This is Sr.xx‘ for 


the second-degree equation, and, since the least-squares fit minimized 
S(7 — YoY, the value of sr.xx^ = 13.2 would be expected to be smallest. 
It is somewhat surprising that the VV, X relationship, which involved 
a least-squares fit to the V 7 values, also gives 13.2 as the standard 
deviation of the 7 values around the 7, values. For the logarithmic 
relationship, which involved a least-squares fit to the log 7 values, the 
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standard deviation of the F values around the Yc values is 14.9. In each 
instance, the unit is tens of board feet. 

Another criterion consists of undertaking to ascertain the estimating 
equation around which the Y values are most nearly normally distributed. 
Since N is only 20, this hardly seems appropriate for this example. 

TABLE 20.6 

Estimates of Volume of Ponderosa Pine Trees and Zones of ± One Stand- 
ard Error of Estimate for Three Equation Types When K ^ 18^ 30 9 

and 40 Inches 

(The values in the body of the table are board feet -r- 10.) 


Estimating 

equation 

X = 

= 18 inches | 

X 

= 30 inches j 

X 

=s 40 inches 

Nega- 

tive 

error 

Yc 

Posi- 

tive 

error 

Nega-j 
tive 
error i 

Yc 

Posi- 

tive 

error 

Nega- 

tive 

error 

. Yc 

Posi- 

tive 

error 

Second-degree. 

13.2 

22.5 

13.2 

13.2 

122.1 

13.2 

13.2 

268.9 

13.2 

Logarithmic .... 

3.0 

23.3 

3.6 

16.3 

115.9 

17.7 

37.8 

285.8 

43.5 

■v^, X 

5,2 

22.1 

6.9 

12.7 

122.5 

13.5 

19.0 

268.3 

19.7 


As indicated at the outset, there is little basis for choice among the 
three non-linear equation types. Perhaps the information presented in 
the preceding paragraphs, together with the logical implication of the 
\/f, X relationship, mentioned on page 504, may cause one to be inclined 
to choose it. When several procedures are of about equal merit, it is not 
inappropriate to choose the simplest one or the one which is easiest to 
compute. On this basis, too, we might select the a/f, X relationship. 

The log y, X relationship. When correlating logarithms of F 
values with X values, the estimating equation is of the type 

(log F). - log a + X log 5. 

The normal equations are 

I. S log F - X log a + log 6 SX; 

II. S(X * log F) - log a SX + log b SXl 

Total variation is^^ 

S(l 0 g - S(log Yy - (i^)S log F; 
explained variation 

S(log y)l - log a S log F + log & S(X • log F) - Gog F)S log F; and 


n See note 8. 
See note 
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unexplained variation is 

2(iog y)l = L(log yY - S(log y)l. 

The coefficient of determination may be obtained from 

2 2 (log y)l 


• loK F.X 


2(log yY 


The coefficient of correlation is, of course, the square root of the coefficient 
of determination. If log a and log h are not needed, nos r.x may be com- 
puted from 


^log Y,X 


A^S(X-log Y) - (SZ)(S log F) 


V[Ar2Z2 - (2A)2][A2(log YY - (2 log F)^] 
The standard error of estimate is 


5log Y.X 




S( Iogy).^ 
N 


The — > X relationship. For this relationship, the estimating equa-. 


tion is of the type 

The normal equations are 
I. 

II 

Total variation is*® 






2 Y = No -I- 62Z; 




a2Z -f b'LXK 




^iY 

N / 


It is not 5J[1 "I" 


wNote that S = 2 = S 

(r - Dj.. Staa„ly. 2 (i); - 2 [(i), - (!)]■ „d 2 (i); - 2 [ 
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Explained variation is' 

and unexplained variation is 




The coefficient of determination may be computed from 



and the square root is ri . Alternatively, the correlation coefficient 

■p.X 

may be had from 

which does not call for the values of a and b. The standard error of 
estimate is 



THE CORRELATION RATIO, v 

When data are arranged in a correlation table as in Table 20.7 and 
when a non-linear relationship is present, it is sometimes of interest to 
know the value of the correlation coefficient which would result if the 
arithmetic means of the columns were used instead of an estimating 
equation. Chart 20.10 shows, by the use of horizontal lines, the column 
means of Table 20.7. It also shows, for purposes of comparison, a second- 
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degree curve fitted to the data. The measure of correlation, based upon 
the means of the columns, is tjr.x, the correlation ratio. It is similar to 
the correlation coefBcients that we have already discussed in that it is the 
square root of the proportion of the total variation in the Y series that 

MAN HOURS 
PER TON 



YIELD PER ACRE IN TONS 

Cliart 20.10. Yield per Acre and Man-Honrs per Ton Required 
to Harvest Broom Corn in East-Central Illinois. Horizoatal lines 
indicate average man-hours per ton for each yield, while curve repre- 
sents computed values from equation F<j = 32^6794 — O.5658420X 
4-0,0003275010Z^. This equation was computed on pp. 721-725 of the 
first edition of this text. Bata from source given helow Table 20*7. 

has been explained by the variation of the column means That is 


/ variation explained by column means 
^ total variation of the F series 


There is also a correlation ratio, itfx.Y, which is the square root of the proportion 
of the total variation in the X series that has been explained by the variation of the 


row means. 
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or, in symbols,^® 


L 

%.X *“ 


k 

S 

1 


S [mi?c - F)^] 


2(F - F)= 

r/N, \2 

(ill 

.Vc 


k 

\2l 

(SF) 


2 

\ 1 / 

- FSF 

1 

Nc J 

SF2 - : 

PSF ’ 


(SF)^ 


SF= - 


(2F)2 

iV 


where Fj is the arithmetic mean of a column, 

No is the number of items in a column, 

Nc 

S indicates a summation over the Nc items in a column, and 
1 

k 

S indicates a summation over the k columns. 

1 

Since the data of a correlation table are in terms of class intervals, this 
expression must be rewritten as for a frequency distribution or as for a 
correlation coeflScient computed from a correlation table. The expression 
becomes 


**■ 


k 

s 

1 



L Nc 

N 



{'LSrd’rY 

N 


Substituting the values from Table 20.10 gives 


150.60065 




103 


220 

0.681, 


(16)^ 

103 


148,115 

217.515’ 


indicating that 68 per cent of the variation in man hours (the F variable) 
has been explained by the use of the column means. The correlation 


“ Proof of tie equality of tie first and last of tie tiree expressions follows tiat 
siown in Appendix S, section 26,1, 
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tore, TecMcftl BulMia No. 349, February 1933. 
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ratio is the square root of this value, so 

Vr.x = Vo.681 = 0.825. 

The correlation ratio has no sign, since the relationship is not necessarily 
positive, or negative, for all values of the two series with which one may 
be dealing. Furthermore, the horizontal axis may represent qualitative 
categories rather than numerical values. 

The correlation ratio is of interest primarily because of its relationship 
to the curvilinear correlation coefficient. The correlation ratio is always 
equal to or larger than the correlation coefficient obtained by use of a 
curve fitted to the grouped data, provided the number of constants in the 
equation is equal to or smaller than the number of columns used in 
computing Both ijr.x and the curvilinear correlation coefficient 

become larger as the number of columns or the number of constants in 
the equation is increased. 

There are several limitations to the usefulness of the correlation ratio. 
First, the data must be grouped — not necessarily on both axes, but the 
independent variable must be grouped. Second, if the number of groups 
for the independent variable is increased, the value of the correlation 
ratio increases, becoming 1.0 if the groups become so numerous that there 
is only one observation in each group. Third, there is no estimating 
equation, and therefore no satisfactory way of making estimates of the 
dependent variable. 



Symbols Used in Chapter 21 


For the symbols used in the first paragraph of this chapter, see the list 
accompanying Chapter 19 . 

ai.2: value of Xci.2 when X2 == 0 in the estimating equation Xei.2 = ^1,2 + 
&12X2. Same as a in the estimating equation Yc = a + bX used in 
Chapter 19 . 

ai.3: value of Xci.3 when X3 = 0 in the estimating equation Xei.z ^ o&i.s + 

&13X3. 

csi.23: value of Xci.23 when X2 = 0 and X3 = 0 in the estimating equation 

Xcl.23 == C^L2S + bi2.zX2 + &I3.2X3. 

ax.24: value of XcL24 when X2 = 0 and X4 = 0 in the estimating equation 
Xcl.24 == <3^1.24 + bi2AX2 + bi4.2X4, 

ai.34‘. value of Xci.34 when X3 = 0 and X4 = 0 in the estimating equation 

Xc2.34 = CIl84 + blZAXz + bu,zXi, 

value of Xci.234 ...m whcu X2, X3, X4, ' * • j Xm equal zero in 
the estimating equation Xci.234...m == C£i.2S4...w + 5 i 2 . 84 .-.mX 2 + 

?> 13 . 24 ...?nX 3 + hl 4.23 • • • 7nX^4 * * * 'd' b ir)i,2S • • • (m — l)Xffi, 

ai.22'3: value of Xci,22'3 when X2, Xf, and X3 equal zero in the estimating 
equation Xci.22'3 = ai.22'3 + bi 2 . 2 * 3 X 2 4 ” bi 2 \ 23 X 2 4 ” ^13,22X3. 

612: coefficient of X2 in the estimating equation Xci.2 = ai.2 4“ {>12X2. 
Same as b in Chapter 19 . 

bizi coefficient of Xg in the estimating equation Xci.3 = Ui.s 4- &13X3. 
&12.3; coefficient of X2 in the estimating equation Xci.23 = hi.23 4- bn.zX2 
4 “ biz.zXz, 

613.2: coefficient of X3 in the estimating equation Xci.23 ~ ai.23 4” 612.3X2 
4 “ 613.2X3. / 

6i2,4y 614.2: coefficients, respectively, of X2 and X4 in the estimating equa- 
tion shown above for ai.24. 

613.4, 614.3: coefficients, respectively, of X3 and X4 in the estimating equa- 
tion shown above for ai.34. 

612.34: coefficient of X2 in the estimating equation Xci.234 = ei.234 4- 
612.34X2 4 “ 613.24X3 4 “ 614.23X4. 

613.24: coefficient of X3 in the estimating equation Xci.234 — o.i,2u 4“ 
612.34X2 + 613.24X3 4 - 614.23X4. 

614.23: coefficient of X4 in the estimating equation Xci.2S4 = + 

612.34X2 + 613.24X3 4 “ 614.23X4. 
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?^12,34...w, 6i3.24..*m, !^14.23...m? * * ' > 6lm.23 . • . <m-l) • COefficieiltS, leSpeC- 
tively, of X2, Xs, X4, * • • , Xm in the estimating equation given above 

for <Xi. 234 .--m. 

&i2.2'3, ^>l2^23, &13.22': coefficientS; respectively, of X2, X^, and X3 in the 
estimating equation given above for ai.22'3. 

621 i coefficient of Xi in the Estimating equation Xc2.i = 0^2.1 + 621X1. 
Used in this chapter only to assist in the computation of ^12.34. 

/3i2.34, / 3 i 3 . 24 , ^u.2z: lowcr-cEse Greek beta; beta coefficients which represent 
one way of measuring the individual importance of, respectively, the 
variables X2, X3, and X4. / 3 iw. 23 ... (m-i) is the generalized form for 
measuring the importance of X^. 

^L.34? dig.24, ^14.23: coefficients of separate, determination. One way of 
measuring the individual importance of, respectively, X2, X3, and X4. 

23..‘(w-i) is the generalized form for measuring the importance of 
X,,; 

N: the number of items in a sample. In multiple or partial correlation, 
N is the number of sets of observations. 

rlzi coefficient of determination for Xi and X2. 

rl^: coefficient of determination for Xi and X3. 

r^: coefficient of determination for Xi and X4. 
coefficient of determination for X2 and X3. 

rl^: coefficient of determination for X?. and X4. 

f34: coefficient of determination for X3 and X4. 

coefficient of partial determination; the additional variation in Xi 
explained by X2, expressed as a proportion of the variation in Xi which 
was unexplained by X3. 

^13.2- coefficient of partial determination; the additional variation in Xi 
explained by X3, expressed as a proportion of the variation in Xi which 
was unexplained by X2. 

ri 2 . 4 , ri 3 . 4 , ri 4 . 2 , ^14.3, ^24.3, ^34.2: coefficients of partial correlation, used in 
this chapter to assist in computing various other measures. 

^12.34’ coefficient of partial determination; the additional variation in Xi 
explained by X2, expressed as a proportion of the variation in Xi which 
was unexplained by X3 and X4. 

^13.24- coefficient of partial determination; the additional variation in Xi 
explained by X3, expressed as a proportion of the variation in Xi which 
was unexplained by Xa and X4. 

^14.23- coefficient of partial determination; the additional variation in Xi, 
explained by X4, expressed as a proportion of the variation in Xi 
which was unexplained by X2 and X3. 

^12.34 * ^ general form of the coefficient of partial determination; the 
additioml variation in Xx explained by X2, expressed as a proportion 
of the variation in Xx wMeh was unexplained by X3, X4, • * * , Xm* 
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^ general form of the coefficient of partial determination; 
the additional variation in Xi explained by expressed as a propor- 
tion of the variation in Xi which was unexplained by X2, Xa, • • • , 

X(to— !)• 

rijn.23... (w-2), ri(w_i).23... (w-2), r^(w-i).23 • . • (m-2> ^ general forms of 

coefficients of partial correlation used’ in this chapter to compute 
fim. 23 . . . (m-i). Note that the three coefficients are one order below the 
one being computed; the first excludes X(„i_i), the second excludes X^y 
and the third excludes Xi. 

?*i(34).2- multiple-partial coefficient of determination; the additio7ial 
variation in Xi explained by X3 and X4, expressed as a proportion of the 
variation in Xi which was unexplained by X2. 

•Si. 23- coefficient of multiple determination; the proportion of variation in 
Xi which was explained by X2 and X3. 

coefficient of multiple determination; the proportion of variation in 
Xi which was explained by X2 and X4. 

Rlzi: coefficient of multiple determination; the proportion of variation 
in Xi which was explained by X3 and X4. 

5 ^. 234 * coefficient of multiple determination; the proportion of variation 
in Xi which was explained by X2, X3, and X4. 

-Si.234...m* ^ general form of the coefficient of multiple determination; the 
proportion of variation in Xi which was explained by X2J X3, X4, 

y yjt* 

-Si.234**-<m~i)* ^ general form of the coefficient of multiple determination 
used to assist in the computation of ^he proportion of 

variation in Xi which was explained by X2, X3, X4, * • * , X(m-i>. 

a general form of the coefficient of multiple determination used 
to assist in the computation of ^12.34...^; the proportion of variation in 
Xi which was explained by X3, X4, • • • , X^. 

s%y S2t Szj Siy • * • : respectively, the standard deviations of the Xi, X2, 
X3, X4, * • • series. 

S1.2: standard error of estimate for the estimating equation Xci.2 = C&1.2 + 
&12X2. Same as sr.x in Chapter 19 . 

siX* standard error of estimate for the estimating equation Xci.s == ai.8 + 
hiaXa. 

S1.23: standard error of estimate for the estimating equation Xgi,u — <^1.23 

+ hu.zXt + &13.2 Xa. 

S1.24: standard error of estimate for the estimating equation Xcim - 
<3&i.24 + & 12 . 4 X 2 + bu,tX4* 

Si. 34 - standard error of estimate for the estimating equation Xei.84 »= 

U1.34 + ^13.4X3 + 614,3X4- 

ai.234: standard error of estimate for the estimating equation Xfii.234 = 

ttl .234 + 612.34X2 + 613.24X3 + 614.23X4. 
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^ general form of the standard error of estimate for the esti- 
mating equation given above for ai.234...m. 

^ general form of the standard error of estimate used to 
assist in computing 61^.23... (m-i). 

S: upper-case Greek sigma, meaning ^Hake the sum of” 
total variation of the values. 

4: variation of Xi explained, respectively, by X23 by 

Xzj and by X4. 

Sa:fi.23j 2a;ci.34: variation of Xi explained, respectively, by X2 and 

Z3, by X2 and X4, and by X3 and X4. 

2a:^\.234* variation of Xi explained by X2, X3, and X4. 

2x^^i, 234 . ..m*. a general form for explained variation; the variation of Xi 
explained by X2, X3, X4, • * * , X^. 

234- • • (7n-l)> 2X^34... general forms for explained variation; the 
variation of Xi explained, respectively, by X2, X3, X4, • • • , X(„x~i) 
and by X3, Xi, • * * , X^. Used to assist in computing ^ {m-D 

and r^2.34 . . . 

2X51.3, 2X51.4: variation of Xi unexplained, respectively, by X2, by 
Xs, and by X4. 

'Sxfi.ss, 2X51.24, 2X5^.341 variation of Xi unexplained, respectively, by X2 
and X3, by X2 and X4, and by X3 and X4. 

2x^1.234: variation of Xi unexplained by X2, X3, and X4. 

2X51.234. . . m* s, general form for unexplained variation; the variation of Xi 
unexplained by X2, X3, X4, • • * , X^. 

2x^1. 234 •••(!«.— !)> 2X51.34...^: general forms for unexplained variation; the 
variation of Xi unexplained, respectively, by X2, X3, X4, ' * • , X(m-i) 
and by X3, X4, * • • , X„t. Used to assist in computing 
and r 12.34... m. 

Xi, X2, Xs, X4, • • • , x^: values in the Xi, X2, X3, X4^ • • • , X«t series 
expressed as deviations from their respective arithmetic means. 

Xci : see 2xfi with various additional subscripts. 

Xsii see Xxh with various additional subscripts. 

Xi: the Xi series, also an observed value in the Xi series. Thus, we refer 
to correlating Xi with X2, X3, and X4, but 2Xi means ^'take the sum 
of the values in the Xi series.’^ 

X2, Xg, X4, • • ^ , X'm: respectively, the X2,X3,X4, • * - , X,„ series; also 
observed values in those series. See Xi, 

Xi, X2, Xs, X4, * • * , Xm- respectively, the arithmetic means of the 
Xi, X2, Xs, X4, * • * , Xm series. 

AT1.2: at computed value of the Xi series when the estimating equation 
Xci.2 aia + bnX^ is used. Same as Fc in Chapter 19. 
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Xci,z’ a computed value of the A"i series when the estimating equation 
Aci. 3 ~ ^'1.3 fcisATs is used. 

A’ci.2s: a computed value of the A'l series when the estimating equation 

Xcl.2Z “ ^^1.23 "f" 1^12. 3-^2 4“ ^13.2A^3 iS USOd. 

Xci.24: a computed value of the Xi series when the estimating equation 
shown above for ai.24 is used. 

Xci.34: a computed value of the Xi series when the estimating equation 
shown above for ai.34 is used. 

Xci.234: a computed value of the Xi series when the estimating equation 
Xci.234 = ^1.234 4“ ^12.34X2 4” &13.24X3 4" ^14.23X4 is used. 

Xci.234 ..*m: a computed value of the Xi series \vhen the estimating equa- 
tion shown above for is used. 

X"'ci.22'3: a computed value of the A'l series when the estimating ecjuatioii 
shown above for ai.22'3 is used. 



CHAPTER 21 

Correlation III: Multiple and Partial 
Correlation 


PRELIMI>iARY EXPLANATION 

Simple correlation. Before plunging into the discussion of multiple 
and partial correlation, it will be useful to review briefly the elementary 
principles of two-variable linear correlation, since the more refined 
measures involve simply an extension of the procedures already discussed. 
First, an estimating equation of the type 

Ya^ a + bX 

was computed by the method of least squares. This permitted us to 
make estimates of the value of the dependent variable from values of the 
independent variable. Next, it was demonstrated that the total varia- 
tion of the dependent variable was the sum of: (1) the explained variation 
and (2) the vaiiation which we had failed to explain by our hypothesis; 
that is, that 

= Xyl + 


It should be remembered that we computed Y^y^ by the formula 

V = SF2 ^ FSF; 

and that I^yl was computed from the expression 

Xyl FSF, 

in which 

2Yl^a2Y + bXXY 

Hyl - h2xy, 

530 


or, more simply, 
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( 

The standard error of estimate Sy.x; which is ^ enabled ns to judge 

the range of error of our estimates of the dependent variable. was 

obtained by subtracting the explained variation frpm the total variation; 


that is, 






Finally, a measure was computed that permitted us to state the propor- 
tion of total variation which had been explained by variations in the com- 
puted values of the dependent variable. This ratio, 



was known as the coefficient of determination^ and its square root was 
called the coeffikient of correlation. 

Multiple correlation. Exactly the same principles are involved in 
multiple correlation as in simple correlation, but The procedure is more 
laborious, since there is more than one independent variable. Also, it is 
necessary to use slightly different symbols. The illustration in this 
chapter will deal with the relationship between suicide rates by regions, 
and average age, per cent male, and business-failure rate in those same 
regions. Suicide rate is the dependent variable, and the other three are 
independent variables. 

To simplify computations so that they can be shown in full in this 
chapter, the United States has been divided into 19 regions of substan- 
tially equal population and more or less homogeneous characteristics. 
With the exception of New York State, which has been divided into New 
York City and upstate New York, the boundaries of these regions follow 
state boundaries. The composition of the different regions can be 
observed by reference to Table 21.1. Selection of homogeneous areas of 
equal population serves to make the statistical results more meaningful 
in that each region is given proper weight in the calculations. On the 
other hand, use of only 19 observations with an equation of 4 constants 
does make the degrees of freedom (see the section in Chapter 2G dealing 
with the significance of multiple-correlation coefficients) rather small 
The results obtained must therefore be regarded as primarily of illustra- 
tive importance.' 

It simplifies the notations somewhat if, instead of using different letters, 
each of the variables is designated by the letter Y, differentiating between 
the variables by means of subscripts. ' This is particularly true if the 
number of 'variables is large. ■ We' shall therefore designate our variables 
in this manner: 
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Dependent Variable: 

Suicide rate , ... . . . Xi 

Independent Yanables: 

Average age 

Per cent male. . - - • • Xz 

Business ■‘failure rate . . X4 


It is interesting to note that, of our three independent variables, taro 
relate to charaeteristics of the population while onty one, business ‘failure 
rate, can be thought of as a possible cause. Whatevei the ca^ises of 


TABIP ?l ^ 


Wif'hrth'el} Her>iogeneous in the United Stute^ 

Jppr:?zimii*^h/ Population in 


Region 

mnphex 

Poptdafion 

in 

viiUionp 

States mchfdel 


1 

6.5 

Maine, New Hampshire, Varment, Massachusetts 


2 

7.6 

Rhode Island, Connecticut. New Jersey 


3 

7.9 

New York City 


4 

6.9 

New York, excluding New York City 


5 

10.5 

Pennsylvania 


6 

7.9 

Ohio 


7 

10.3 

Indiana, Michigan 


8 

8 7' 

Illinois 


9 

6.4 

Wisconsin, Minnesota 


10 

6.6 

Iowa, Missouri 


11 

5 8 

North Dakota, South Dakota, Nebraska, Kansas, Colorado 

12 

10 8 

Delaware, Maryland, District of Columbia, Virginia, 
Carolina 

North 

13 

8,3 

South Carolina, Georgia, Florida 


14 

8,2 

West Virginia, Kentucky, Tennessee 


15 

7.9 

Alabama, Mississippi, Louisiana 


16 

5.6 

Arizona, New Mexico, Arkansas, Oklahoma 


17 

6 2 

Montana, Idaho, Wyoming, Washington, Oregon, 
Nevada 

Utah, 

18 

10.6 

California 


10 

7.7 

Texas 



suicides, it is reasonable to conjecture that they do not affect each age 
and sex with equal intensity. 

In the pages that follow, we shall start with variables 1, 2, and 3,, and, 
after explaining the basic concepts and computations, variable 4 will be 
Introduced. General formulas for m variables will then be given. 

The first step in the correlation procedure is -to obtain an equation 
which includes both of the independent variables as a means of esti- 
mating a suicide rate for any region. , The estimate is labeled X<,i.sss, 
since it is an estimate of variable Xi computed from variables X 2 and 
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Xa- Since there are iwo independent variabicejj tiiere vviii Ce two b'b. 
The equation type will be 

Xcl.23 ™ <2 i,23 + ^ 12.3X2 i” hiZ,2*^Z> 

o 

A word concerning the meaning of the d's and their oubsciipus L neced- 
sary. These net coefficients of estifnation indicate the effect oix Xi of a 
change in the accompanying independent 'vaxiabie when allowance iia^ 
been made^ for the other independent variable. Thus, is au estimate, 
of the variation in suicide rate associated wdth a \-ariatioii in average age, 
independent of variation in per cent male. The social scientist, is accus- 
tomed to saying “other things being equal.^’ The other thing which it. 
held equal in this instance is the proportion of males in the different 
regions. As between regions that have the same percentage of males bui 
differ with respect to age, each variation of one year in average age 
between regions will normally be accompanied by a variation of in 
suicide rate. The other b coefficient in the estimating equation is inter- 
preted analogously, the figure to the right of the decimal point in the 
subscript indicating the factor that is held constant. Of course, really 
to know the effect on suicides of age alone, we should hold constant di 
other factors, not just per cent male. As we introduce more and mure 
variables, this desirable situation is more and more closely approxiiiiatod 
The constant ai .23 is the hypothetical value for suicide rate when Iho 
other factors considered have a value of zero. The estimate of suicide 
rate for any region is the sum of the net amounts associated with each 
independent variable plus the value for a. 

We might observe at this point that the natural scientist can oftei^ 
design Ms experiment so as to control a number of the variableb, such, fox 
instance, as temperature, humidity, or air pressure. The biologist and 
the agricultural experimenter can control their variables to a considerable 
extent. On the other hand, economics and sociology, and most of the 


X Teohnlcaliy, allowance is made for a variable by subtracting its effect on the othei 
variables. Thus, if 

» a?! — Xci.s; 

— aJa Xez.t'i 

then hxt is the slope of Xsia on Xs± 3, and hn , 2 is the slope of xm on Specifically : 

1jXiX2 ^ , , Sx«i.3Xj|2.S 

bn,z — — ; 


Put Ost.a *= ““^Ti — “ 


h: 
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social sciences, generally have to use the observational rather than the 

experimental method. Since workers in these fields usually have only 
very limited control, if any, over the material they must use, they must 
attempt to hold some of the variables constant statistically (rather than 
experimentally) by means of the techniques explained in this chapter.^ 
As in previous instances, the total variation of the dependent series is 
the sum of two quantities: (1) the variation in the estimated values of 
that series from their mean, and (2) the variation of the actual values 
from the estimated values, that is, 

The procedure for computing measures of relationship is, essentially the 
same as with simple correlation. The standard error of estimate is 



and the coefficient of multiple determination is 

?>2 _ ^^eX.23 

states the proportion of total variation that is present in the varia- 
tions of the computed, or Xd.n, values, and which has been explained by 
reference to the independent variables. The coefficient of multiple cor- 
relation Ri 23 is the square root of the coefficient of multiple determina- 
tion. R has no sign, since the association may be positive with one but 
negative with the other independent variable. It is interesting to note 
at this point that, as additional associated independent vax'iables are 
brought into a problem, approaches 1.0 and approaches 

zero. If we were able to include all pertinent independent variables, 
would be 1.0, and we could make perfect estimates of Xi. 

Partial correlation. We have seen that the use of variable 
resulted in a certain amount of explained variation, indicated by 
but some of the variation in the dependent variable was not explained; 
this was Introducing variable in addition to X 2 , gave 

explained variation indicated by which must exceed M 

variable Xs is germane to the problem. In any event, Xixlrjz' cannot be 
smaller than , , 

2 Another method, usually not practical, is to select from the observed data observa- 
tions that have a constant valuc; with respect to all Independent variables except the 
' one being, studied. 
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Now, the amount of variation unexplained by X 2 was but X 3 

explained an additional amount of variation, indicated by 
Sxci.2* If we write 

we have rfg.s, the coefficient of partial determination. To put the above 
expression in words and to state it more generally, we may say that the 
coefficient of partial determination is the ratio of: (1) the increase in the 
variation of the complied values of the dependent variable resulting from the 
introduction of another independent variable to (2) the variation that had not 
been explained before the introduction of the new variable. 

Since 


the expression for r's.j may be written in either of the two following ways: 






cl.23 


Sa:* 


el.S 


:exI 


or 




cl.23 




’cl. 2 


1.2 


2a:? 




If the numerator and denominator of the expression last given are divided 
by 2a:?, we have 

B 2 ^2 

_ 1.23 ^12 


In this form the coefficient of partial determination may be regarded as 
the ratio of : (1) the increase in the proportion of variation of the computed 
values of the dependent variable resulting from the introduction of 
another independent variable to (2) the proportion of the variation that 
had not been explained before the introduction of the new variable. 

The square root of r?3,2, ri3.2, is the coefficient of partial correlation and 
takes the sign of bn^ in the estimating equation. The subscript 13.2 for 
the coefficient of partial correlation indicates, for our problem, that the 
correlation is between suicide rate, Xi, and per cent male, X3, when aver- 
age age X2 has been held constant at a value of X2. If we could pick out 
regions that are exactly alike with respect to age, the simple correlation 
between suicide rate and per cent male for those regions would tend to 
be the same as the above coefficient of partial correlationi. One purpose 
of partial (or net) correlation coefficients is to indicate the relative 
importance of the different independent variables in a problem in explain- 
ing variations in the dependent variable. 
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COMPUTATION PROCEDURE 

Computation o£ sums. Since this chapter will require a consider- 
able number of measures of relationship among the four variables, it will 
be convenient to compute ats one time all of the values that are needed 
in the different formulas. The original data for the four series, together 
with their sums and arithmetic means, are shown in Table 2L2. The 
individual squares and products and the sums of the squares and products 

TABLE 21.2 

Suicide Rate, Average Age, Per Cent Male, and Business Failure 
Rate for 19 Regions of the United States, 1949 or 1950 


Eegion 

Suicide 1 
rate 

Average 

age 

Per cent 
male 

X3 

Business- 
failure rate 

1 ! 

12.49 

31,28 

48.73 

54.63 

2 

12.02 

32.43 

49.27 

43.55 

3 

10.10 

34.50 

48.43 i 

66,73 

4 

12.61 

32.79 

49.27 

29.25 

5 

10.66 

31.30 

49.25 

28 66 

6 

11.97 

31.20 

49.44 

35.32 

7 

11.44 

30.10 

50.17 

24.68 

8 

11.56 

32.70 

49.58 

33.59 

9 

11.42 

30.80 1 

50.30 

26.01 

10 

12.47 

31 80 

49.44 

19.13 

n 

12.75 

29.46 1 

50.55 

8,74 

12 

10.11 

29.16 

49.74 

26.52 

13 

9.38 

26.90 

49 16 

27.61 

14 

9.15 

26.87 

49.80 

23,12 

15 

6.50 

26.60 

49.20 

22.71 

16 

8.25 

26.65 

50.12 

16.17 

17 

14.26 

29.21 

51.40 

36.90 

18 

17.50 

32.10 

50.02 

81.63 

19 

9.08 

27.90 

50.10 

15.13 

Total 

Mean 

213.62 

11.243158 

572.75 

30.144737 

943.97 

49.682632 

620.08 

32.635789 


Xi\ Suicides per 100,000. 

Xzi I\fedian age where a state constitutes a region; otherwise, the 
simple mean of the state medians. New York, excluding New York 
City, was computed from the relationship: 

•*Yu|M>tateMedupstat« iVatattflVIedatftte iVcuyMedcjty. 

Xii Failures per 10,000 business concerns, 

Bata from publications listed below; 

Population in tBSO: United States Department of Commerce, Bureau of the Census, 
SevmUenih Cmsus of the United States, 1950, Vol. I. 

Suicide rnte in I$4B: United States Department of Commerce, Bureau of the Census, 
Vital Statistics of the United States, W4B, Pdri XI, Place of Eesidence. 

Per cent male in IBSOi United States Department of Commerce, Bureau of the Census, 
Seventeenth Census of the United States, 1950, Vol, If, Characteristics of the Popidafion, 
BMimss failure rate, 194$: United States Department of Commerce, Bureau of the 
Census, Statistical Ahslract of the United States, IMI, and Dun and Bradatreet, Inc. 
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Based on data in TabJo 21,2, 
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are shown in Table 21.3. From these we obtain the sums of the squared 
deviations and the sums of the products of deviations. For example,® 

Sx? = SX? - XiSXi. 

' Xxl = SX^ - laSXi!. 

SxiX2 = SX1X2 - X1SX2 or SX1X2 - XsSXi. 

SX1X3 = SX1X3 - X1SX3 or SX1X3 - XsSXT. 

Using these, and similar formulas for the other sums, gives 

2x1 = 2,505.54 - (11. 243 158) (213.62) = 103.78. 

2x^ = 17,375.31 - (30.144737) (572.75)'= 109.91. 

2x| = 16,907.35 - (49.682632) (943.97) = 8.44. 

2x^ = 26,146.00 - (32.635789) (620.08) = 5,909.02. 

2xiX2 = 6,503.54 - (11.243158) (572.75) = 64.02. 

2x1X3 = 10,621.88 - (11.243158) (943.97) = 8.68. 

2xiX4 = 7,403.46 - (11.243158) (620.08) = 431.80. 

2x2X3 = 28,445.56 - (30.144737) (943.97) = -10.17. 

2x2X4 = 19,159.54 - (30. 144737) (620.08) = 467.39. 

2x3x4 = 30,731.28 - (49.682632) (620.08) = -75.93. 

Gross measures of relationship. Simple correlation is in reality 
gross correlation, since it measures the relationship between two variables, 

’ The derivation of these equations is fairly obvious. 

= S(Zi - X,)®, , 

= S(x; - 2X1X1 + It), 

, = SX? - 2 X 1 SXi + Nil, 

= 2 X? - 2X1 SXi + XtSX,, 

- 2 X 1 - XiSXi. 

Sxi*2 = SKXi - XiXXj..- X 2 )], 

= 2 (XiXi - XiXj - X 2 X, + X 1 X 2 I, 

= SX1X2 -:X,SXs - XsSXi +''XXiX 2 , 

y SX 1 SX 2 , 2 X 12 X 2 

AiZfAa — “"Y i jyT” j 

=» SXxXs - 

^ In Table 21.2 ihe observations usually have four significant digits. Therefore^ 
the products in Table 21.3 are usually recorded to five or six digits. . Nevertheless, 
the values shown here have only three digits in two instances. The various measures 
in this chapter computed from these values cannot contain more than three or four 
significant digits, and sometimes only two or three. More have been recorded, how- 
ever^ in order to afiord internal checks on computations and to contribute to the 
accuracy of final results based on intermediate computations. 
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without any adjustment by correlation technique for the effects of other 
variables. Using the symbols developed in the introductory section^ we 
compute the following measures if we wish to correlate suicide rates Xi 
with average age X2- alone: 


Estimating equation: 

Xf.1.2 =- Ui.a 4 ” &12X2 or rci.2 = hi2^2* 
Normal equations: 

I. SZi = Naj.i + biiXXi or h = a, .2 + 

(Xi 2 = Xl — hi2X.2> 

II. SXiXs = ai.i'EX, + biaSXl or SxiOJs = 

, 2x1X2 

Total variation: 

Sx? == 2Xf -- JiSXi, 

Sum of squares of computed values and explained variation: 

= ai.sSXi + 6122X1X2. 2:r,V2 bi.ZziX^. 

(Sum of explained squares) (Explained variation) 


Unexplained variation: 




2 X- - 2x5 


Standard error of estimate, 


S1.2 = ■ 




N 


2 Xf 


Coefficient of correlation: 

4 


2X,=‘ 


cl.2 


N 


or 


or 


2X" 


>"12 


01.2 


Ji 2 Xj 


2 x= - 2x,^.,. 


ISxl - 2x;^.2 


yj- 


N 


XXl - Ji2Xi 


or 




The reader may already have noticed that we have merely set down the 
various equations and formulas used in simple correlation, but with 
slightly different symbols. . ^ . 

Results of computations based on these expressions are pven below, 
In order to avoid needless labor, the formulas shown oh the right above, 
usin^ deviations from metas, are used; ' 
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Constants for estimating equation: 

64.02 


-'12 


+ 0.58248. 

(0.58248) (30.144737) = -6.316. 


109.91 
Oi.2 = 11.243 

Estimating equation: 

Xci.i = -6.316 + 0.5825X2. 
Xci.i = + 0 . 58253 : 2 . 

Total variation: 


'Lxl = 2,505.54 - (11.243158) (213.62) = 103.78. 
Explained variation: 

2*^.2= (0.58248) (64.02) = 37.290. 
Unexplained variaiion: 

Sa;.\.2 = 103.78 - 37.290 = 66.490. 
Standard error of estimate: 


Sl.2 


66.490 

19 

1.87. 


3.499. 


Coefficient of correlation: 


r 


2 

12 


37.290 

103.78 


= 0.35932. 


rn = +0.5994. 


Following the same procedure for variable 3, we obtain: 


bn ~ + 1. 02844 j 
Oil. 3 = — 40.853 j 
Xxl^n = 8.927; 

= 94.853; 

Si.8 = 2.23; 

= 0.08602; 
rn = +0.2933. 

Chart 21.1 shows scatter diagrams of the simple relationship between 
suicide rates and each of the independent variables being considered. 
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The correlation coefficients for these three relationships and the coeffi- 
cients of correlation between the three independent variables are : 

ri2 -= +0.5994. f23 = -0.3339. 

= +0.2933. r24 - +0.5800. 

ri4 = +0.5514. f34 - -0.3400. 

It is interesting to note, at this point, that average age, X 2 , showed the 
highest gross correlation with suicide rates, and that per cent male, X 3 , 



showed the lowest. Later we shall see whether the independent variables 
retain the same rank in importance when the effect of the other variables 
is removed. 

Two independent variables s mnltiple correlation. Naturally, we 
can expect to estimate suicide rates more accurately if we take two inde- 
pendent variables into consideration, rather than only one. Hence, let 
us make estimates from both average age and per cent male. The 
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estimating-equation type is 

Ac 1.23 = CLU2Z 4 “ ^12. 3X2 + hlZ,2^Zf 

or, in terms of deviations, 

Xci.n == bi2.zX2 + biz.2^3. 

The 1.23 subscripts after Xc and a tell us that we are estimating lvalues of 
Xi (suicide rates) from variables X2 (average age) and Xz (per cent male). 
The first b indicates the normal change in suicide rates associated with a 
unit change in average age for regions that have the same per cent male 
composition ; the second b tells us the normal change in suicide rates asso- 
ciated with a unit change in per cent male for regions of the same average 
age. 

The normal equations required are: 

I. SXi “ Aai .23 4 “ bi2.z^X2 4 “ 613.2^^35 
IL 2Z1Z2 = ai. 23 SX 2 + bi 2 a^Xl -b 613.22X2X3; 

III. 2X1X3 « ai. 232 X 3 + 612.32X2X3 + 6 i 3 . 22 X^. 

Considerable labor may be saved if the normal equations are put in terms 
*of deviations from the means. In this case, the first equation disappears, 
since I^xij 2:^2, and 'Zxz are each zero. The remaining two equations 
are: 

TL 21X1X2 = 6 i 2 . 32 a ;2 4 * hn, 22 x 2 Xz] 

III. SiTi^s = bi2.z2x2Xz 4 " 613.2 Zxl 

Making the required substitutions, we have: 

II. 64;02 = IO9.9I612.3 - 10.17613.2; 

III. 8.68 = -10.176i2.3 + 8.446i3.2. 

Solving these simultaneous equations gives: 

612.3 = +0.76267; 

613.2 — + 1 . 94744 . 

To get ai.23, we use Equation I, dividing it by N, obtaining: 

Xi = 061.23 + 612.3X2 + 613.2X3. 

<*1.23 == Xl — ?>i2.3X2.~ 613.2X31 

== 11.243158 - (0.76267)(30.144737) - (1.94744) (49.682632), 
=*■-108:.%.- 

, The estWating equation, theu,, is 

■tcun =r — ;10.8..%.+ 0 . 703 X 2 + Ii947X8. 
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The explained variation is® 

= 6i2.32rirr2 + hn.^^XiXz, 

= (0.76267) (64.02) + (1.94744) (8.68), 
= 65.730. 


The other measures of relationship are now computed in a manner pre- 
cisely similar to that employed when there was only one independent 
variable. 

-^^sl.28 — — '^cl.23> 

- 103.78 - 65.730 = 38.050. 


2 

"V 2 

a 1.23 

_ 38.050 

®1.23 ~ 

N 

19 

Si. 23 = 

Rh, = 

1.42. 

Tr- 

05.730 


~ 103.78 


Ri.n = 0.7958. 


= 2.003; 

- 0.63336; 


Since the coefficient of multiple determination, is 0.6334, we 

have explained 63 per cent of the variation present in Xi. Notice that 
Ri, 2 z is greater than either or the value of rjs was found to be 
0.3593, while rl^ was 0.0860 

The standard error of estimate, Si.23, was ascertained to be 1.42, which 
is smaller than either Si .2 = 1.87 or Si,z == 2.23. Estimates made of Xi 
using the two independent variables X2 and X3 will be more satisfactory 
than estimates made by use of either or X3 alone. More specifically, 
the standard deviation of the Xi values around the estimating equation 


Xci.2s = 0^1.23 + hi2.zX2 + 513 . 2 X 3 

is less than the standard deviation of the Xi values around 


or around 


Xcl.2 == Ctl.2 + 5x2X2 
Xcl .3 = Cll.Z + 613X3. 


Two independent variables; partial correlation* When only 
one independent variable (age) was considered, the explained variation 
was = 37.290. When two independent variables (age and per cent 
male) were used, the explained variation was increased to SrrgVss ” 
65.730. Therefore, the increase in the variation explained by per cent 


®Also, - 211.23 - IxSXi, where Sl^.^s = + hioSXjXa -f 

5sa,2 2fXil’^. 
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male is 

= 65.730 -- 37.290 = 28 . 440 . 


After taking age 
explained was 


alone into consideration, 


the variation remaining to be 


= 103.78 - 37.290 = 66.490. 


The proportion of the variation previously unexplained, then, which was 
explained by including per cent male also, is the ratio 


28.440 

66.490 


0.42773. 


As noted before,, this ratio is known as the coefficient of partial determina- 
tion, the square root of which is the coefficient of partial correlation. 
That is, 


>- 13.2 


^^el.23 ~ ^^cl.2 

Sxl - 


_ 65.730 - 37.290 
66.490 

ri3.2 = +0.6540. 




0.42773; 


The sign of this coefficient of partial correlation is the same as the sign of 
5 i 3.2 in the estimating equation. This coefficient is a measure of the 
closeness of relationship between suicide rate and per cent male when age 
has been held constant statistically; it is the simple correlation coefficient 
which would be expected for regions of the same average age. As pre- 
viously stated, if the numerator and denominator of the above expression 
for r® 5 ,2 are both divided by Sxf, we obtain a formula showing the rela- 
tionship between the partial determination coefficient and two gross 
determination coefficients. Thus, 


^ 13.2 


*' 13.2 


-^ 1.23 ~~ 

1-rU ’ 

0.63336 - 0.35932 
0.64068 
+0.6540. 


0.42773. 


Note that each of the values recorded in this formula is that in the pre- 
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ceding form ala divided by 103.78 (in fact, this is the procedure we have 
already employed to obtain i2i,23 a-nd rl^). This formula may then be 
used as a check® on the final division needed to compute and 
Also, it may be used when is computed by some procedure other than 


^12 


2a:? 


or when the coefficients of determination, or coefficients of 


correlation, but not the original data, are given. 

As a companion measure to ri 3 . 2 , we should obtain the partial coeffi- 
cient ri 2 . 3 , which measures the relationship between suicide rate and age 
when per cent male has been held constant. This is done by finding the 
increase in the variation of the computed values by using age and per 
cent male in our estimating equation rather than using per cent male 
alone. Thus: 


2 ^ _ 65.730 - 8.927 

~ ~ 94.853 

_ 0.63336 - 0.08602 
1 - r?3 “ 0.91398 

= 0.69885; 
ri2.z = +0.7739. 

Partial coefficients, such as ri 3.2 and ri 2 . 3 , are often referred to sls first-- 
order coefficients, since one variable has been held constant. Simple 
coefficients are called zero-order coefficients, since no variables were held 
constant. Later in the chapter, we shall consider ri 2 . 34 , r 13 , 24 ? and. r 14 . 23 , 
which are second-order coefficients. Stated generally, the order designa- 
tion indicates the number of variables that have been held constant 
statistically. 

The gross correlation between suicide rate and age, ri 2 , it will b‘e 
recalled, was +0.5994. Removing the effect of variations in per cent 
male from both variables has increased the relationship materially, since 
Tna “ +0.7739. Similarly, ria, the gross correlation between suicide 
rate and per cent male, was +0.2933. Removing the effect of variations 
in age resulted in fi 3.2 = +0.6540, again a decided increase. 

RelatiosisMp between JR1.23 and the measures of gross and partial 
correlation* The reader may be surprised to note that Ri,n is but 
0.7958 whenri 2 — +0.5994 and ris = +0.2933, It is not a characteristic 
of these measures that the multiple coefficient is the sum of the two gross 


® Note, however, that there Is a tendencj' for the iiuiaerator and denominator to 
lose a significant digit because of the division by Sarf. 
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coetRcients. The relationship is more complex than that.^ It may be 
said, however, that, for given values of Vu and vn having the same sign, 
the less the duplication in the independent variables (that is, the lower 
their positive or the higher their negative correlation ; r23 in this case), the 
higher will be the multiple correlation. In the present instance, r23 = 
— 0.3339, and hence the addition of either age or per cent male materially 
improves the correlation over that obtained from the use of either 
independent variable alone. 

Neither is the multiple coefficient of correlation the sum of the two 
partial coefficients. However, there is an additive relationship (derived 
from the expressions just given for and rja.g) which may be written in 
either of two forms: 

1^1.23 ~ ^12 ^13.2(1 ^12)> 

= 0.35932 + (0.42773)(1 - 0.35932) =* 0.6334, or 

1^1.23 ~ ^13 d* ^12.3(1 “■ ^is); 

= 0.08602 + (0.59885) (1 - 0.08602) = 0.6334. 

It is interesting to note the thought behind these equations. The 
first one, for example, involves the sum of: (1) the proportion of variation 
explained by using one independent variable and (2) the product of (a) 
the proportion of variation unexplained by that independent variable, 
1 ^12; and (b) the proportion of (a) explained as a result of using the 

other independent variable in addition to the first one, fja.g. 

Three independent variables: multiple correlation. In the pre- 
ceding paragraphs, we considered the two independent variables, average 
age, Xs, and per cent male, X3. If we add a third independent variable, 
business-failure rate, X4, we use an estimating equation of the type 

XcI .234 = ^1.234 + <[^12.34X2 “h 613.24X3 + 614.23X4, 

To obtain the four constants, four normal equations are required if we use 
X-values. They are 

I. 2 ^X 1 “ iVUi.234 "b 612.342X2 "b 613.24SX8 + 614.232X4,’ 

II. SX1X2 == 0^1.2342X2 "b 612.342X2 “b 6 i 3.24 2 X2A'^3 “b 614.232X2X4J 
iir. SX1X3 « ui, 2342X3 + 612.342X2X3 + 613.242X3 + 614.232X3.Y45 
IV. 2X1X4 = ai. 234 2X4 + 612.342X2X4 + 6i3,24 2 X 3.Y4 + bu/rs^Xl, 
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However, by using a:-values, we eJiminate normal equation I, as before, 
giving 

II. 'SiXiXi = 4" 5i8.2i2a;2X3 + bu,2z^X2Xi', 

III, XxiXs = hl2.ZA^X2Xz + & 13.24 

IV. ^XiXi =» bu,Bi^X2X4 + bn.u^XzXi +*6i4.23 Ixl 

9 

Substituting in normal equations II, III, and IV the sums of sc|uared 
deviations and the sums of products of deviations, obtained earlier, we 
have 

II. 64.02 = 109.916i2.34 - 10.176i3.24 + 467.396i4.23; 

III. 8.68 = -10.176i2.34 + 8.446 i 3.24 - 75.936i4.23; 

IV. 431.80 = 467.396i2.a4 - 76.936i3.24 + 6,909.206i4.23. 

Since the procedure for solving three simultaneous equations was given 
on pages 487-489, it will not be repeated here. The solution yields 

612.34 +0.53634} 

618.24 =* +2.20484} 

6i4.28 — +0.06906. 

If we write normal equation I in the form 

^1.284 = — 6i 2.84-?2 6l8.24^3 hu.n^if 

we can substitute the values of the arithmetic means from Table 21.1 and 
the 6- values just given, obtaining 

ai.234 = 11.243158 - (0.53534) (30. 144737) - (2.20484) (49.682632) 

~ (0.06906)(32.635789), 

« -116.36. 

The estimating equation, then, is 

Xa.284 « -116.36 + 0.535X2 + 2.205X3 + 0.0591X4. 
Explained variation is 

.!u fiis.a.SasiiC!! + feia.a.Saiiars + 6i4.2a2a:ia:4, 

= ( 0 , 53534 ) ( 64 . 02 ) + ( 2 . 20484 ) ( 8 . 68 ) + ( 0 . 05906 ) (43 1 . 80 ), 

- 78 . 913 , 

and unexplained variation is 

2xh.2»4 = 2 !x! - SxL2U, 

= 103.78 - 78.913 = 24 . 867 . 



548 


CORRELATION III 


[Chap. 21 


W^e can now compute the standard error of estimate, which is 

4 


Sl.234 


- 

Y 




24.867 


N 


19 


= 1.14. 


The coefficient of multiple determination and the coefficient of multiple 
correlation are 


R 


2 

1.234 


^ 78.913 
2x1 103.78 


0.76039; 


Ri.m “ 0.8720. 


Before proceeding to compute partial coefficients, it is desirable to see 
what improvement in our relationship has resulted from using variable 
Xi. It will be recalled that Rl^ was 0.6334, indicating that we had 
explained 63 per cent of the variation in Zi by referring to Z2 and Z3. 
We have just found RI 234 to be 0.7604. Now, by use of the three inde- 
pendent variables, we have explained 76 per cent of the variation in the 
dependent variable.* Not only does RI 234 exceed RI 33 , but it is also 
larger than either RI 24 or 12^34. Neither of these last two coefficients has 
been previously computed. They are 

Rli 4 = 0.4218 and Rl^ = 0.5654. 

It had been noted previously (page 543) that 22 “.j, was larger than either 
7*12 or 2*13. The reader can verify (1) that Ri,24 exceeds either 7*12 or rlt, 
and (2) that Rt34 is larger than either or rjj. 

As the value of or B increases with the addition of appropriate inde- 
pendent variables, the value of the standard error of estinaate decreases. 
We previously found si.2s to be 1 . 42 ; now we see that Si.234 f= 1 - 14 . The 
values of 51.24 and 51.34 (neither of which was computed before) are each 
larger than 51,234; they are 

^1.24 = 1.78 and 51.34 = 1 . 54 . 

It is clear that estimates of suicide rates made from the use of all three of 
the independent variables will be more satisfactory than estimates made 
by using any two of them. Stated more exactly, the standard deviation 
of the Xi values around the estimating equation 

Xci.234 “ Ui.234 + il2,S4^S + + 514.23^4 

is smaller than the standard deviation of the Xi values around 

Xt.1,23 = ai,2S + 5 12. 3 X 2 + 5 13. 2 Za, 


® It must be remembered that adding another independent variable causes the loss 
of an additional degree of freedom. Thus, It may occasionally happen that the value 
of may be increased, but the increase may not be significant. Testing the signifi- 
cance of partial and multiple coefficients of deteraiination is discussed toward the end 
'of Chapter 
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or around 
or around 


Xfll .24 = ai .24 + 612.4X2 + 614.2X4, 
Xcl.34 = < 2 ;i .34 + 613.4X3 + 614.3X4. 


Three independent variables: partial* correlation, 
the procedure previously used, 


2a;a.234 — 2x1 


cl.23 


2x! 


2x? 


= 78.913 - 65.730 
103.78 - 65.730 
ri4.23 = “f"0,5886. 


0.34647. 


Paralleling 


Since ^ 14.23 “ 0.3465, the use of independent variable X 4 enabled us to 
explain 35 per ceiit of the variation which X 2 and X 3 had failed to explain. 
The sign of r 14.23 is positive, to agree with the sign of 624 . 2 s, and this coeffi- 
cient measures the relationship between suicide rate Xi and business- 
failure rate X 4 , when X 2 and X 3 have been held constant statistically. 
At a later point we shall obtain the values of ri 3.24 and ri 2 . 34 , which are, 
respectively, measures of the correlation between variables Xi and Xs 
with X 2 and X 4 held constant and between variables Xi and X 2 with X 3 
and X 4 held constant. 

The value of ^ 14.33 may also be obtained from the expression 


p 2 p 2 

2 'n/1.234 -^1.23 

^14.23 1 p2 ^ 

i — ni.23 


_ 0.76039 - 0.63336 
1 - 0.63336 
^14.23 ~ “f"0,5886. 


0.34647. 


Four or more iudependeut variables. Although the reader can 
probably supply the formulas for multiple and partial correlation when 
more than three independent variables are to be used, a set of generalized 
expressions may be helpful. The formulas which follow are expansions 
of those already used; generalizations of certain formulas which have not 
yet been employed will be given at the appropriate later locations. For 
m variables, we have:® 

Estimating equation: 

Xa, 234 .--w = 0 ;i. 234 .-.w + 612,34- .!wX2 + 6is.24-.mX3 + 6i4.23.-mX4 

+ * “ * + 6i»»,s0-.(m-i)Xm* 

* When there are four or more independent variables, it is advisable to use the 
Doolittle method (or some other systematic procedure) for the solution of the simul- 
taneous equations. The Doolittle method was described on pages 498-503. 
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Normal equations^ in deviation form: 

I. ai.234.*.m = -S*! bl2.84..- w-^2 5i3.24-*m^8 — &14.28 * • • w-S*! 

— • ' • — 6iw. 28... (w-l)^m* 


II. 

2iXiX2 = bi2.84.' 

4^ &X8,24.. . wj 

, 2 X 2 X 3 4“ ?>X4.28- . •mSx2X4 

-j- • • • 4“ l!>lm.28. . . Cw~l)Sx2Xw. 

Til. 

2xiXi = 612 , 34 .. 

.m'Sx2X^ + &18.24. . 

4* 2^14.23 ‘tn^^8X4 

+ ' * ' 4* blm.23. . . 

IV. 

3 X 1 X 4 = 612 . 34 .. 

, .,^2X2^4 4“ !>i8.24.. 

.f/i^XiiXi 4 ^14.23 ••• Tn^X 4 

4* * ’ * 4" &lm.28. . . (m~l)2X4Xw. 

m. 

SxiXm = 6 i2.84. 

..mSx 2 X»t 4* &18.24. 

4" ^14.28 ••• mSx4Xtn 


4" • * * 4* ??iw.28-.. 


Explained variation: 


~ 6i2.84...wSa:ia?2 4* bi3.24...«»2a:ia;3 4* bx4.23...»nSa;iX4 

4* * • * 4 “ !!>iw. 28... (Tn-i)Sa;ia;m. 

Unexplained variation: 

• m “ 'SXi Sa?|,i,234 • • • w 

Standard error of estimate: 


Sl.2S4...m ~ 


f ^^«X>234‘» 

N 


Coefficient of multiple determination: 

/^L234-*-w *= ^12 + ^ ^ 12 ) + ^14.23(i -f * ♦ * 

4“ ^Lu23- •• ~ ■Ki.284'** <«»-!))* 

Coefficients of partial determination: 

% _ ^^(01.234** -trt "3* .234* • 

^lm.23* * * 'o'^2 j 

Sa?! 2 :I?c 1.234 •••(«»- i) 

1 "" 


'^cl.234** *wi *^"^el.234***{m^l) 


"lm.23 - •* 
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'12.84- m 


^^cl.2a4---in ^^fl.34---i 

Sa:? - Sa:.V34---». ' 

t 

-^ 1.234 »-m ^ 1,34 • » • nt 

1 ~~ ^1.34-- • m 


Multiple-partial coeflicicntft. Just as the coefficient of partial 
determination measures: (1) the increase in the amount of variation of 
the computed values of the dependent variable resulting from the intro- 
duction of another independent variable relative to (2) the variation 
which had not been explained before the introduction of the new vari- 
able, so the multiple-partial coefficient of determination measures the 
relative increase resulting from the, introduction of two or more new inde- 
pendent variables. For example,^ 

2 _ ^■^cl.284 ~ _ ^1.234 ^12 

All of the values called for in these expressions have already been ob- 
tained, so we may compute 


^1(34). 2 


^(34). 2 


1.234 ^^cl.2 

0 « o ^ 


2x5 ■ 

- 2 x .\.5 

78.913 

- 37.290 _ 

103.78 

- 37.290 

f^l.234 

— 

^12 

1 ~ 

^12 

0.76039 - 0.35932 


1 - 0.35932 


0.6260. 


- 0.6260. 


The value of variation in A^i, which was not 

explained by A'a, 03 per cent has been explained by Xs and X4. As 
would be e.xpccted, ri(84).2 larger than either 2 “ 0.4277 or rf4,2 
0 . 0977 . Note that the coefficient of multiple-partial correlation (ri(34).2 
0.7912 in this instance) has no sign, since tlie relationship between 
the dependent variable and each independent variable in parentheses 
may be either positive or negative. In this case, both are positive. 

The relationship between the multiple correlation coefficient, the 
partial correlation coefficient, and the multiple-partial correlation 
eoelficlent may be understood more clearly if it is pointed out: (I) tliat 
is simple correlation of Xi with Xci.234; (2) that r 12.34 is simple cor- 
relation of Xtthu with Xut,u, that is, simple correlation of (Xi "y 513.4X3 
514.3X4) with (X2 623.4X3 ” 624..'X4 )j and (3) thatri(23).4 is inuitF 

pie correlation of :r»i,4 with Xsis,i and 
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ANOTHER APPROACH TO MULTIPLE AND PARTIAL 
CORRELATION COEFFICIENTS 

Sometimes one is presented with the results of a study which show only 
the 2er0“0rder correlation coefficients for a number of variables. If 
multiple and partial coefficients are wanted, it is possible to obtain them 
from the zero-order coefficients. The formulas which we shall use for the 
partial coefficients will also serve to indicate why partial correlation 
coefficients sometimes become larger and sometimes smaller as more 
variables are held constant. In the preceding discussion we considered, 
first, multiple correlation coefficients and then partial coefficients. For 
the present treatment, it will be advantageous to consider partial coeffi- 
cients first, since the multiple coefficients for four or more variables are 
most conveniently obtained by using certain of the partial coefficients. 

First-order partial correlation coefficients. Any first-order 
coefficient may be determined from the values of three zero-order coeffi- 
cients. For example, 


ris.2 = 


ru - rur2z 

Vl - rl, Vl - rl^ 


Since we shall compute eight of these first-order coefl&cients, and the 
reader may msh to ascertain the valu es of ot hers, there are listed below 
all of the zero-order r, 1 — r®, and Vl — r* values. We shall use some 
of the 1 — values for computing multiple coefficients. 


ru = +0.5994; 
Tit = +0.2933; 
ru = +0.5514; 
rn = -0.3339; 
r24 = +0.5800; 
ru = —0.3400. 


rli = 0.3593; 
r\^ = 0.0860; 
rL = 0.3040; 

= 0.1115; 
f|4 = 0.3364; 
rli = 0.1156. 


1 - rli = 0.6407; 
1 - rf, =. 0.9140; 
1 - = 0.6960; 

1 - = 0.8885; 

1 - r|4 = 0.6636; 
1 _ r?4 = 0.8844. 


V 1 - rL 0.8004; 
Vl - = 0.9560; 

Vl - rli == 0.8343; 
Vl - = 0.9426; 

Vl - r^4 = 0.8146; 
V 1 - rli = 0.9404. 
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When four variables are involved in a correlation problem, there are 
twelve possible first-order coefficients.^® For onr purposes^ we shall com- 
pute only eight of these: the six having Xi as the dependent variable and 
two others, r24.3 and r34.2, which will be used to obtain second-order partial 
coefficients. If our objective were merely ,to oblain the three second- 
order coefficients, shown in the next section, we would not need the last 
two of, the six first-order coefficients having Xi as the dependent variable. 


ri 3.2 


ri 4.2 


ri 4.3 


Tu - ri2r23 


Vl 


^12 Vl — r 
ri 4 ~ ri 2 r 2 i 


rn VT- rii 


-^/i „ ^2 

ru — ruTzi 

Vl - rl, W- 

+ 0 . 7243 . 




ri 2.3 “ 


ri 8,4 ~ 


ri 2 ^ 137*23 


Vl - rl Vl - 
+ 0 . 7738 . 

7*13 — 7 ’ i 47*34 


^ 12.4 


Vl - ri, Vl - rl* 

+ 0 . 6128 . 

ru — TiiTu 


0.2933 - ( 0 . 5994 ) (- 0 . 3339 ) 
( 0 . 8004 ) ( 0 . 9426 ) 

( 0 . 5514 ) - ( 0 . 5994 ) ( 0 . 5800 ) 
( 0 . 8004 ) ( 0 . 8146 ) 

( 0 . 5514 ) - ( 0 . 2933 ) (- 0 . 3400 ) 
( 0 . 9560 ) ( 0 . 9404 ) 

( 0 . 5994 ) - ( 0 . 2933 ) (- 0 . 3339 ) 
( 0 . 9560 ) ( 0 . 9426 ) 

( 0 . 2933 ) - ( 0 . 5514 ) (- 0 . 3400 ) 
( 0 . 8343 ) ( 0 . 9404 ) 

( 0 . 5994 ) - ( 0 . 5514 ) ( 0 . 5800 ) 


+ 0 . 6540 . 


+ 0 . 3125 . 


*"24.3 = 


Vl - rli Vl - rit 

7*24 r23r34 

Vl - Vl - 

+ 0.5262 


7 * 34.2 


r 23 "V i — r 34 

^ 2 . 

7*34 7 * 237*24 

~ Vl - r/s Vl - rii 

= - 0 . 1906 . 


( 0 . 8343 ) ( 0 . 8146 ) 

( 0 . 5800 ) - (- 0 . 3339 ) (- 0 . 3400 ) 
( 0 . 9426 ) ( 0 . 9404 ) 

(- 0 . 3400 ) - (- 0 . 3339 ) ( 0 . 5800 ) 
( 0 . 9426 ) ( 0 . 8146 ) 


+ 0 . 4114 . 


We can now see why first-order coefficients are sometimes larger and 
sometimes smaller than zero-order coefficients. Consider three of the 


Proof that these formulas are the equivalent of those we have been using is given 
in Appendix S , sectio n 21, L The labor of computation can be mater ially shortened 

if values of -%/! are looked up in J. R. Miner, Tables of \/"T— and 1 — for 

Use in Partial Correlation and Trigonometry ^ Johns Hopkins Press, Baltimore, 1936, 
or Truman Lee Kefiey, The Kelley Statistical Tables^ revised edition, The Macmillan 
Company, 194S, 
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first-order coefficients: ( 1 ) ^13.2 is larger than fu. Since ru and r2s have 
unlike signs, and rn is positive, the value of the numerator of the expres- 
sion for ri3.2 is larger than rn. The fact that the denominator is less than 
1.0 serves further to increase the result. ( 2 ) ri4.2 is smaller than 
Since the product of rn and ^24 does not exceed vuj since vn and r24 have 
like signs, and since tu is positive, the value of the numerator of the 
expression for ri4.2 is smaller than ru. Although the denominator is less 
than 1 . 0 , it was not enough smaller than 1.0 to increase the result suffi- 
ciently to make it equal or exceed ru- ( 3 ) r34.2 is smaller (that is, shows 
a lower degree of correlation) than r34. Since the product of r2s and r24 
does not exceed r34, since r23 and r24 have unlike signs, and since r34 is 
negative, the value of the numerator in the expressiofi for r34.2 is a smaller 
negative value than r34. The denominator, though smaller than 1 . 0 , was 
not small enough to increase the result to a point where it would equal or 
exceed r34. 

Second-order partial correlation coefficients. Second-order coeffi- 
cients may be obtained from first-order coefficients. We shall compete 
only those second-order coefficients having Xi as the dependent variable. 
They are: 


T14.23 = 


run 


1 — ri 8,2 ^L .2 


Tn,u 


^18.2 ““ ^14.2^34.2 


Vl - V 1 - r\i,i 


ri 2.84 =* 


ri2.3 runrn.z 


Vl - r'U., Vl 


^ 24.8 


(0.3125) - (0.6540) (-0.1906) 

Vl - (0.6540)^ Vl - (0.1906)» 

== +0.5887. 

(0.6540) - (Q.3125)(-0.1906) 

Vl - (0.3125)“ V l - (0.1900)“ 

= +0.7653. 


(0.7738) - (0.7243) (0.5262) 

Vl - (0.7243)2 V l - (0.5262)2 

= +0.6697. 


Alternative formulas, giving the same results, are available for all three 
of the second-order coefficients. They are: 


^* 14.28 

?'l 8.24 

^1244 


^ 14,8 ^* 12 . 8 ^ 24.3 


^ 1^3 a/ 1 r 24.3 

t'lsu — ^12, 4^28.4 


Vl — ri 2,4 Vl — r 23^4 

^12,4 f|8.4f28.4 

•v^l - ’•m V'i - 
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Notice that ru,n is larger than Tu.^, On the other hand, r 14.23 is smaller 
than ri 4 .a. Similar comparisons may be made between the other second- 
order coefficients and the appropriate first-order coefficients. 

An expressions^ for ri ^.234 ...(m-i) is 


Tim. 28... Cm- 1) 


rim.23... (m-2) . . . Cm~2)rm(ffl-1).28. » . Cm-2) 


"\/ 1 •\/'i "" ^'2 

Vi— / . . . (,n— 2) V X — 


tw(w— 1).23 ... Cm— 2) 


It is interesting to pause at this point and inspect some of the results of 
our computations. Below are shown the zero-order, first-order, and 
second-order coefficients involving Xi as the dependent variable: 

sss +0.5994. fi2,3 = +0.7738. 7 * 12.34 *= +0.6697. 

7*12.4 == +0.4114. 

f23 =: +0.2933. ri3.2 = +0.6540. ri3.24 = +0.7653. 

7*13.4 — +0.6128, 

ri4 « +0.5514. ri4.2 - +0.3125. ri4.23 = +0.5887. 

7*14,3 — +0,7243. 

When no allowance had been made for the effect of other variables, X 2 
(average age) ranked first and Xz (per cent male) ranked last. When 
adjustment was made for X 4 , per cent male Xz ranked ahead of average 
age X 2 ; when adjustment was made for Xs, average age X 2 was ahead of 
business-failure rate A^ 4 ; when adjustment was made for X 2 , per cent 
male Xz ranked above business-failure rate X 4 . Finally, when two 
independent variables were held constant, per cent male X 3 was first and 
business-failure rate X 4 was last. 

Multiple coefficients. It has already been pointed out in footnote 7 
that three-variable multiple coefficients may be obtained from the zero- 
order coefficients. Thus: 

7*12 + r?3 - 2ri2ri3r23 

l-r1. ' 

0.3593 + 0.0860 - 2(0.5994) (0.2933) (-0.3339) 

0.8885 


0,6333. 

« 0.7958. 

Otlier forms may also be written. However, this is the most logical form, since 
partial coefficients are being built up from those of lower order, using in turn variables 
X2, Xa, A4, . . . , Aw. It would be possible to drop from the subscript of the first r 
in the numerator, not (m - 1), as was done here, but any subscript other than 1 or m. 
For example, if 3 were dropped, the three coefficients would have as subscripts: 

lif»,24 » • * » 13. fl • * • (m-t) J and * fw— I)* 
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r,a + rli - ^n^nTu 

^ 1.24 — 1 „2 ’ 

1 ~ ^24 

_ 0-3593 + 0.3040 - 2(0.5994) (0.5514) (0.5800) 

0.6636 

= 0.4218. 
jffii.24 = 0.6495. 

„2 ’'13 + r\^ — 2rizrurzi 

■n'1.34 — 1 ...2 ’ 

1 ?^34 

_ 0-0860 + 0.3040 - 2(0.2933) (0.5514) (-0.3400) 

0.8844 ' 

= 0.5654. 
i?i.34 = 0.7519. 

From the general formula, given on page 550, we may write the follow- 
ing, the first one of which was used on page 546 : 

^:f.23 = rl^ + J-?3.2(l - J-ia) =» 0.3593 + (0.4277) (0.6407) = 0.6333. 

= 0.7958. 

RLi = r?2 + r?4.2(l - rli) = 0.3593 + (0.0977) (0.6407) = 0.4219. 
Ri,2i “ 0.6495. 

Rlu = r\^ + ?'L3(1 - rlz) = 0.0860 + (0.5246) (0.9140) = 0.5655. 
Ri.h = 0.7520. 

•^1.234 ~ ^12 "f" >* 13 . 2(1 rjj) -1- ?’u.23(l -^ 1 . 23 )) 

= 0.3593 + (0.4277) (0.6407) + (0.3466) (0.3667), 

= 0.7604. 

R 1.134 — 0.8720. 

Rearranging the formula for rij., given on page 544, we may also write 

1 - Rln = (1 - rl,)(l - r?3.2)-. 

Rln = 1 - [(1 - rL)(l - r|3.2)]. 

This expression may be put into a general form for m variables by writing 

lai 

•«'1.284---m — 

1 “ 1(1 ~ ^12)(1 ” ^ 13 . 2 ) (1 ~ ^ 14 . 23 ) • • • (1 — ?*lm.23---Cm-l))]- 

A variation of this expression is 

■®l, 2 U---m “=1 1(1 ■“ •®l.a 34 '--t»»~l>)(l ~ >'lm.23---(OT-ll)]- 
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Coefficients estimation and standard errors of estimate. 
When only the values of the zero-order coefficients are known, it is not 
feasible to undertake to ascertain the various b-values and the standard 
error of estimate. However, if si, or Xxl and N, are known, we can obtain 
the standard error of estimate from 


Sl.234..*m — Si "v 1 ■^1.234-‘*w 

To compute the coefficients of estimation requires a knowledge of other 
standard errors of estimate. Thus, 


• . TO 

Ito. 23... C»i-1) === ^lin.23. 

Sm. X23 . . . (fn-1) 

OTHER MEASURES OF THE INDIVIDUAL IMPORTANCE 
OF THE INDEPENDENT VARIABLES 

We have already considered the coefficients of partial determination or 
correlation as naeasures of the individual importance of the three inde- 
pendent variables. Two other measures of the individual importance 
of the independent variables are occasionally used. 

Beta coefficients. It will be remembered that the following relation- 
ship was used in simple correlation: 


ri2 


hu 


S2 


The beta coefficients are akin to this expression, being written 


^ 12.34 ~ 

Si 

Sg 

^13.24 ^13.24 n,nd 

Si 


^ 14.23 



The reader should not confuse these measures with fix and used to 
describe a frequency distribution. The two sets of measures are entirely 
different in nature. 

For purposes of computation, we shall write 


Si 
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and similarly for the other ratios, giving 
a h 

Pit.u ■= 


^ +0.53534 , 


+0.6509. 


|3i 3.24 = 6u.a 


+2.20484 , 


= +0.6288. 


*= 5i4.2 


= +0.05906 . 


/5, 909.20 
103.78 


+0.4467. 


The ranks of the three 0 coefficients are the same, for. our problem, 
as were the ranks of the corresponding partial coefficients. This will 
usually, although not always, be the case. 

The expression for / 3 im. 2 a..;(».-i) may be written: 

/? 

P!m.23...(m-1) “ Olm.23...(m-l) 

Coefficients of separate determination. If the expression 

r>2 ^^cl.284 

"1.234 = 

_ 612.3422:12:2 + 6ij,24Sa:ia:s + 6n. 2322:12:4 
So:? 

be divided into three parts, designated ^^ 2 , 34 , di 3 . 24 , and dn.as, so that 
JO 612.3422:11:2 


(0.53534) (64.02) 
103.78 

6i8.24lSa:ia:3 

Sxf ' 

(2.20484) ( 8 . 68 ) 
103.78 

6i4.28S2;ia;4 

Sxf ' 

(0.06906)(431.80) 

103.78 


0.33024: 


0.18441; and 


0.24573; 
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we have three coefficients of separate determination^ which, when addedf 
give R^, That is, 


I^f.234 ““ ^12.84 d" ^18.24 d“ ^^14. 23* 

0.7604 = 0.33024 + 0.18441 + 0.24573. 
The expression for • • < (w-d is 




5lm.23 


• * (m— ) '^XxXm 


Although the values may be added to produce 22^, they have several 
shortcomings, one of which is that they are believed to be more subject 
to sampling error than cither the partial coefficients or the beta coeffi- 
cients. Furthermore, the values measure not only the determination 
attributable to the independent variable to the left of the decimal in the 
subscript, but also a portion of the joint determination of the other inde- 
pendent variables.^* 

It may be of interest that di 2 .u = and that similar expres- 

sions may be written for other coefficients of separate determination. 


MULTIPLE CURVILINEAR CORRELATION 

As in the case of relationships between two variables, the relationship, 
between a dependent variable and one or more independent variables is 
sometimes non-linear. When this is true, we may use a polynomial or we 
may transform one or more variables into logarithms, reciprocals, roots, 
•or powers, or convert in some other manner. 

Polynomials. If the relationship between Xi and Xi appears to be 
non-linear, while that between Xi and Xz is linear, the equation type 


,22'3 ~ ai.22'3 + &12.2'3X2 + hi2\2sXl + 6l3.22'X3 

might be used. This equation would, presumably, result in a greater 
amount of explained variation than would use of 

Xcl.23 = Oi.23 + 612,3X2 + 613.2X3. 

The increase in the amount of explained variation may be tested for 
significance by using the methods described for partial coefficients of 
determination in Chapter 20. A polynomial was used for a non-linear 
multiple correlation analysis on pages 779-784 of the first edition of this 
text. 

Transforma lions. Using logarithms, reciprocals, roots, powers, or 
some other function of the values of one (or more) of the series may result 

For a discussion of these and other points, sec M. Ezekiel, Methods of Correlation 
Amlysisj John Wiicy and Sons, New York, 1041, second edition, pp, 498-500. 
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in rl^ducing a non-linear relationship to linear form. For example, an 
estimating equation might be of one of the following types: 

Xcl.23 = ^^1.23 + bi2.3 log X 2 + & 13 . 2 X 3 ; 

Xcl.23 = C^l.23 + bu.Z^i + bl3.2 

XcL23 = <3 &i.23 + ^12.3 " + ^> 13 . 2 X 3 ; 

A 2 

log Xcl.23 = 0^1.23 + 2?12.3X2 + &13,2X3. 

Various combinations are also possible. When using a transformation, 
one should, if possible, formulate a hypothesis concerning the nature of 
the relationship between the variables, as was done in the case of the 

(VY)o = a + 

transformation employed for the data of ponderosa pine trees in Chapter 

20 . 

Graphic Method. Statisticians in the United States Department of 
Agriculture have developed an extremely flexible technique by which 
curves of net relationship and a coefficient of multiple correlation may be 
obtained through successive approximations by means of charts and use 
of mathematics no more advanced than simple arithmetic. While this 
method has distinct limitations, it is useful as an exploratory tool in 
determining the appropriate type of equation to fit by mathematical 
methods. 

Although the graphic method is extremely flexible, it is also highly 
subjective. “Rarely would two statisticians obtain curves exactly alike 
from the same data. Consequently, good results can be obtained only by 
persons of experience and good judgment. This is in contrast to the 
mathematical procedure based on the method of least squares, in which 
case (barring mistakes) only one possible result can be had for a given 
equation type. A ^pi’actical difficulty is also inherent- in the graphic 
method when a large number of variables is employed. The graphic 
approach is not explained in this edition of this text, but the interested 
reader is referred to pages 784-789 of the first edition. 



Symbols Used in Chapter 2^ 


a: the value of Yo when X = 0 in the equation Fc = ^3^ + hX, 

ai,2z- value of Xci.23 when X2 = 0 and Xs = 0 in the estimating equation 

Xci .23 = «i .23 + &12.3X2 + 513.2X3. 

a2.i3: value of Xc2.i3 when Xi == 0 and X3 = 0 in the estimating equation 
Xc 2 .is = ^ 2.13 + 521.3X1 + 523.1X3. 

5: coefficient of X in the equation Fc == a + 5X. 

5i 2.3: coefficient of X2 in the estimating equation shown above for ai.23. 

5i3,2t coefficient of X3 in the estimating equation shown above for ai.23. 

52i.3i coefficient of Xi in the estimating equation showm above for (12.13. 

523.1 : coefficient of X3 in the estimating equation shown above for a2.i3. 

N: the number of pairs of items for two- variable correlation; the number 
of sets of items for multiple and partial correlation, 
r: coefficient of correlation. ri2, ri3, 1^23 are coefficients referring, respec- 
tively, to Xi and X2, to A^'i and X3, and to X2 and X3. 
ri2.3: coefficient of partial correlation, the values of X3 being held constant. 
Sxi the standard deviation of the x values, 
s,/. the standard deviation of the y values. 

S: upper-case Greek sigma, meaning ‘Hake the sum of.^’ 
x: deviation of an X value from the trend line for the X values. 

X: the X series, also an observed value in the X series. Thus, we refer 
to correlating X and F, but 2X means “sum the values in the X'' 
series.^^ 

Xi: the Xi series; also, an observed value in the X"! series. Thus, we 
refer to correlating Xi with X2 or with X3, or with both X2 and X3, but 
SXi means “sum the values in the Xi series.’^ 

X2, X3: respectively, the X2 series and the X3 series; also, observed values 
in those series. See Xi. 

X«x.23: a computed value of the Xi series when the estimating equation 
shown above for ai.23 is used. 

X''c2.i3: a computed value of the X2 series when the estimating equation 
shown above for a2.i3 is used. 

y: deviation of a F value from the trend line for the F values, 

F: the F series; also, an observed value in the F series. Thus, we refer 
to correlating X and F, but SF means “sum the values in the F 
series. 

Yci a computed value of the Y series, 
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CHAPTER 22 


Correlation IV: Correlation of Time 

Series 


The problem of correlating the cyclical fluctuations of two, or more, 
time series is basically the same as that of correlating non-chronological 
series. However, when correlating time series, we must take cognizance 
of the fact that trend is usually present in annual data and that both 
trend and seasonal variation, as well as irregular fluctuations, may be 
‘found in monthly data. 


ANNUAL DATA 

Table 22.1 shows data of the production of by-product (or oven) coke 
and of beehive coke in the United States for each year, 1941 through 1952. 
Prom the numerical data, little can be grasped concerning the behavior 
of the two series; but when the two series are shown graphically in Charts 
22.1 and 22.2, it is apparent that: (1) the trend of by-product coke pro- 
duction is upward, (2) the trend of beehive coke production is downward, 
and (3) the fluctuations of the two series are positively correlated. 

Correlation of data unadjusted for trend. When correlating two 
time series, we are interested in knowing whether the fluctuations of the 
series move in the same direction or in opposite directions, and whether 
the association is high or low. If our concern is with the trends of the 
two series, rather than with their fluctuations, we would not correlate the 
two trends, since they would of necessity show perfect linear or non- 
linear correlation. Trends are compared either graphically or by 
examining the trend equations. When time series data, unadjusted for 
trend, are correlated, the resulting coefficient reflects both the relationship 
existing between the fluctuations and that between the two trends. The 
data of production of by-product and beehive coke are shown as a scatter 
diagram in Chart 22.3 and the value of the correlation coefficient is found, 
in Table 22.1, to be 4'0.456* This coefficient seems low in view of the 
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agreement of the fluctuations of the two series shown, in Charts 22.1 and 
22 . 2 . The difficulty lies in the fact that the two trends are in opposite 
directions. The effect of trend may be eliminated by correlating per- 
centages of trend instead of correlating the raw data. Alternative] 3 ", we 
may compute the partial correlation coeflicient ri 2 . 3 , where the two series 
are Xi and Zg and where time is Z 3 . Sometimes the effect of trend is 

TABLE 22.1 

Correlation of Production of By-product Cohe and of Beehive Coke, in the 

United States^ 1941-1952 


(Thousands of short tons) 


1 

Year 

By-product 

coke 

production 

Z 

Beehive 

coke 

production 

Y 

ZF 

i 

Z2 

i 

F2 

1941 

58,482 ' 

6,704 

392,063,328 

3,420,144,324 

44,943,616 

1942 

62,295 

8,274 

515,428,830 

3,880,667,026 

68,459,076 

1943 

63,743 

7,933 

505,673,219 

4,063,170,049 

62,932,489 

1944 

67,065 I 

6,973 

467,644,245 

4,497,714,225! 

48,622,729 

1945 

62,094 1 

5,214 

323,758,116 

3,855,664,836 

27,185,796 

1946 

53,929 

4,568 

246,347,672 

2,908,337,041 

20,866,624 

1947 

66,759 

6,687 

446,417,433 

4,456,764,081 

44,715,969 

1948 

68,284 

6,578 

449,172,152 

1 4,662,704,650 

43,270,084 

1949 

60,222 

3,415 

205,658,130 

3,626,689,284 

11,662,225 

1950 

66,891 

5,827 

389,773,857 

4,474,405,881 

33,953,929 

1951 

71,990 

7,343 

1 528,622,570 

5,182,560,100 

53,919,649 

1952 

63,631 

4,601 

292,766,231 

j 4,048,904,161 

21,169,201 

Total 

765,385 

74,117 

4,763,325,783 

49,077,725,663 

481,701,387 


Data from Siatiatical Abstract of the United States, 1930^ p. 707, and 1932, p. 701, and from U. S. 
Department of Commerce, Office of Business Economics, Business Statistics, 1953 Biennial Edition, p. 
170. 

NI.XY - (SX)(SF) ^ 

^ * V[Z2XTT. (SZ)2][ZSF2 - 

^ 12(4,763,325,783) - (765,385 ) (74, 117) ^ 

VI12(49,077,726,663) (765,385) ®][i2 (481,701^^^ 

« +0.456. 

decreased by correlating either ( 1 ) the amounts of change from each year 
to the next for the two series or ( 2 ) the percentages of change from each 
year to the next for the two series. We shall examine each of these pro- 
cedures in turn. 

Correlation of percentages of trend. Obviously, the first step 
consists of determining an appropriate trend for each of the series. For 
our illustration, linear trends will suffice, and Table 22.2 shows the com- 
putation of the trend equation, the trend values, and the percentages of 
trend for by-product coke. Similar computations are ^hown for beehive 
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' Chart 22.1. Production of Bv-producl Coke in the United States and Straight- 
Line Trend, 1941-1952. Data from Table 22.2. 


MILLIONS OF 
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Chart 22.2. Produetion of Beehive Coke in the United States and Straight- 
Line Trend, 1941-1952. Data from Table 22.3. 






BYPRODUCT COKE 

Chart 22.3. Scatter Diagram of Production of By-product and of Beehive 
Coke, 1941-1952. Data of Tabic 22.1. 


TABLE 22.2 


Determination of Trend and Computation of Fer-Cent’-oj^Trend 
Values for Production of By-product Coke^ 1941-1952 


Year 

i 

X 

Production 
(000 short 
tons) 

F 

XY 

Trend 

values 

Fc 

Per cent 
of trend 
[Y YJ 

1941 

-11 

58,482 

-643,302 

60,645.3 

96.43 

1942 

- 9 

62,295 

-560,655 

61,215.6 

101.76 

1943 

- 7 

1 63,743 

-446,201 

61,785.9 

103.17 

1944 i 

- 5 

67,065 

-335,325 

62,356.2 

107,65 

1945 

- 3 

62,094 

-186,282 

62,926.6 

98.68 

1946 

- 1 

53,929 

- 53,929 

63,496.9 

84.93 

1947 

1 

66,759 

66,759 

64,067.2 

104.20 

1948 

3 

68,284 1 

204,852 

64,637.6 

105.64 

1919 

5 

60,222 ^ 

301,110 

65,207.9 

92.35 

1950 

7 

66,891 

468,237 

65,778.2 

101.69 

1951 

9 

71,990 

647,910 

66,348.6 

108.60 

1952 

11 

63,631 

699,941 

66,918.9 

95.09 

Total 

0 

765,385 

163,115 




Data, from sources given below Table 22.1. 


N 

a 

b 

Fo 


12. SX2 - 2(286) « 672. 
SF 
N 

SXF _ 163,115 
572 


1 -- ’—^ = 63,782.08. 

12 


285.166. 


« 63,782.08 + 285.166A" 

Origin, between 1946 and 1947. 
X units, i year. 
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TABLE 22.3 


Determination of Trend and Computation of F^er -Cent --of -Trend 
Values for Production of Beehive Coke, 1941-1952 


Year 

1 

X 

Production 
(000 short 
tons) 

Y 

XY 

Trend 

values 

Yc 

Per cent 
of trend 
[F -r 7.1 

1941 

ZYf"! 

43,704 ^ 

-73,744 1 

7,288 6 

91 98 

1942 

- 9 

8,274 

-74,466 

7,086 4 

116 76 

1943 i 

7 

7.933 

-55,531 

6,884.2 

115 23 

1944 

- 5 

6,973 j 

-34,865 

6,682 0 

104 35 

1945 

- 3 

5,214 1 

-15,642 

6,479.7 

80.47 

1946 

— 1 

4,568 1 

- 4,508 i 

6,277.5 

72.77 

1947 

1 

6,687 

6,687 - 

6,075.3 

110.07 

194S 

3 

6,578 

19,734 

5,873.1 

112.00 

1949 

5 

3,415 

17,075 

5,670.9 

60.22 

1950 

7 

5,827 

40,789 

5,468.7 

106.65 

1951 

9 

7,343 

66,087 i 

5,266.5 

139.43 

1952 

11 

4,601 

50,611 

5,064.2 

90.85 

Total 

' 0 

74,117 

-57,833 1 




Data ^'rom sources given below Table 22.1. 


N = 12. SX2 - 2(286) = 572. 

a ^ ^ 

N 


6,176.42. 


= -101.107. 


12 

„ XX Y _ -57,833 
XX^ 572 
6,176.42 - 101.107X. 

Origin, between 1946 and 1947. 
X units, 4 year. 



1941 *42 *43 *44 *45 *46 ’4T *48 *49 *50 *51 1952 


Qiart 22.4* Ferceittages of Trend of By-jproduet Coke and of Beekive Coke 
Prodnetion, 1941-1952. Data of Tables 22.2 and 22.3. 
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coke in Table 22.3. The two sets of per-cent-of-trend data have been 
plotted in Chart 22.4, where it may be seen that whenever one series is 
above (or below) its trend line, the other series is also above (or below) its 
trend line. Chart 22.4 gives us no adequate picture of the closeness of 
the relationship; that purpose is served by CJhart 22.6, which is a scatter 

BEEHIVE COKE 
PER CENT OF TREND 



byproduct coke, per cent of trend 

Chart 22.5. Scatter Diagram of Percentages of Trend of Produc- 
tion of By-product Coke and of Beehive Coke, 1941—1952, Data of 
Table 22.4. 

diagram of the two series of percentages of trend. From this scatter plot 
it is clear that fairly high positive correlation is present between the per- 
centages of trend for the two series, and the value of r is found, in Table 
22.4, to be +0.838. 

The situation pictured in the foregoing tables and charts is but one of 
four possibilities.^ They are: 

1. The fluctuations of two time series may be positively correlated, 
but the trends may be in opposite directions. Correlating the data with- 
out adjusting for trend, instead of correlating percentages of trend, will 

^ Throughout the discussion in this chapter, we consider only linear trends and 
linear correlation. When dealing with non-linear trends and/or non-linear correla- 
tion of fluctuations, the results of failing to eliminate trend cannot be so simply stated 
as when only linear relationships are involved. However, if a trend is non-linear, 
It is just as important that its effect be eliminated as If the trend were linear, 
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result in lowering the positive correlation coefficient or may even change 
it to a negative coefficient, if the trends are marked in relation to the 
fluctuations. • In the preceding illustration, r = +0.838 for the per-cent- 
of~trend data, while r = +0.456 for the unadjusted production data in 
tons. 

2. The fluctuations of two time series may be positively correlated, 
and the trends may be in the same direction. Correlating the data 

TABLE 22.4 

Correlation oj Percentages of Trend of Production of By-product Coke and of 

Beehive Coke, 1941-1952 


Year 

By-product 

coke 

X 

Beehive 

coke 

F 

XY 

X2 ! 

72 

1941 

96 43 

91.98 

8,809 6314 

9,298.7449 

8,460.3204 

1942 

101.76 

116.76 

11,881 4976 

10,355.0976 

13,632.8976 

1943 j 

103.17 

115.23 

11,888 2791 

10,644.0489 

13,277.9529 

1944 ! 

107.55 

104.35 

11,222.8425 

11,567.0025 

10,888.9225 

1945 

98.68 

80 47 

7,940.7796 

9,737.7424 

6,475.4209 

1946 

84,93 

72 77 

6,180 3561 

1 7,213 1049 

5,295.4729 

1947 ! 

104.20 

no 07 

11,469.2940 

; 10,857.6400 

12,115.4049 

1948 

105 64 

112.00 

11,831.6800 

11,159.8096 

12,544.0000 

1949 

92.35 

60 22 

5,561.3170 

, 8,528.5225 

3,626.4484 

1950 

101,69 

106.55 

10,835 0695 

10,340.8561 

11,352.9025 

1951 

108.50 

139.43 

15,128.1550 

11,772.2500 

19,440 7249 

1952 

95.09 

90.85 

8,638.9265 

9,042.1081 

8,253.7225 

Total 

1,199.99 

1,200.68 

1 121,447.8283 

120,516.9275 

125,364.1904 


Data from Tables 22.2 and 22.3. 


^ 12(121,447.8283) - (1,199.99) (1,200.68) ^ 

Vfl2^751^5^5) -- (l,199.99>j! 12(125, 364. 1904)“^^^ 

= 4-0,838. 

without adjusting for trend, instead of correlating percentages of trend, 
will result in increasing the positive correlation coefficient. (If the per- 
centages of trend showed r = +1.0, ignoring the trends and correlating 
the unadjusted data could not result in a higher value for r.) Although 
the data cover an extremely short period, the production of pig iron and 
the production of steel ingots and steel for castings for 1946'- W52 will 
serve to illustrate the principle involved. Table 22.5 shows the data, the 
behavior of which may be seen in Chart 22.6. Chart 22.6 also shows the 
trends of the two series, both of which are upward. It is apparent from 
the chart that the fluctuations of the two series about their trends have a 
high positive correlation. Correlating, first, the unadjusted data, we 
findjn Table 22.5 that r - +0.995. When the two series are each put 
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in terms of percentages of trend, the values are those shown in Table 22.6. 
This table shows, also, that correlating the per-cent-of-trend data yields 
f =r +0.994. The per-cent-of-trend figures are so closely related that 
ignoring the trends could not increase the coefficient very much! 

TABLE 22.5 

Correiatmn of Production of Pig Iron and Production of Steel ingots and 
Steel for CasthigSi 1946-1952 


(Millions of short tons.) 


Year 

Fig iron 

X 

Steel ingots 
and steel 
for castings 
Y 

XY 


y2 

1946 

45.6 

66.6 ' 

3,036.96 

2,079.36 

4,435.56 

1947 

59.3 

84.9 

5,034 57 

3,516.49 

7,208.01 

1948 

61.0 

88.6 

5,404.60 

3,721.00 

7,849.96 

1949 

54.2 

78.0 

4,227.60 

2,937.64 ' 

6,084.00 

1950 

65.4 

96.8 

6,330.72 

4,277.16 

9,370.24 

1951 

71.2 

105.2 

7,490.24 

5,069.44 

11,067.04 

1952 

62.2 

93.2 

5,797.04 

3,868.84 

8,686 24 

Total 

418.9 

613.3 

37,321.73 

25,469.93 

54,701.05 


Data from U. S. Department of Commerce, Office of Business Economics, Business Siaiisiics, 1953 
Biennial Edition, pp. 158 and 159. 


^ NSXy - (SZ)(SF) 

V[VSX“ - (SX)=][A''2F! - (SF)*'/ 

^ 7(37,321.73) - (418.9) (613.3) ^ 

■^[7(26, 469.93) - (418.9)“][7(54, 701.05) - (613.3)“]’ 
= +0.995. 


3. The fluctuations of two time series may be negatively correlated, 
but the trends may be in the same direction. Correlating the data with- 
out adjusting for trend, instead of correlating percentages of trend, will 
result in lowering the negative correlation coefficient or may even change 
it to a positive coefficient if the trends are pronounced in relation to the 
fluctuations. 

4. The fluctuations of two time series may be negatively correlated and 
the trends may be in opposite directions. Correlating the data without 
adjusting for trend, instead of correlating percentages of trend, will result 
in increasing the negative correlation coefficient. (If the percentages of 
trend showed r = — 1.0, ignoring the trends and correlating the unad- 
justed data could not result in a higher value for r.) 

If two time series are to be correlated, and if both series have horizontal 
trends, it is, of course, not necessary to express the data as percentages of 
trend. However, if one of the two series has an upward or downward 
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trend, a suitable correlation of the fluctuations of the two series will not 
be obtained unless trend is eliminated from the series showing trend. 

It occasionally happens that annual data for one series are regularly 
known, or made available, before the corresponding yearly figure for 
another, closely correlated series. In such a situation, if the correlation 

MILLIONS OF 
SHORT TONS 



J94G 1947 »948 1949 1950 19SI *952 

Chart 22.6. Production of Pig Iron and Production of Steel Ingots and 
Steel for Castings, with Straight-Line Trends, 1946-1952. Data of production 
from Table 22.5. The trends were computed from these figures. 

is high, a useful estimate may be made for the series which is not so 
promptly available. The procedure consists of: (1) expressing the figure 
which is first available as a percentage of the extended trend for that 
series, (2) estimating a per-cent-of-trend figure for the other series by use 
of an estimating equation obtained from a table like Table 22.4, and (3) 
converting this estimated per-cent-of-trend figure into the units in which 
the series is expressed (tons, dollars, index numbers, and so on) by taking 
the estimated per cent of trend of the extended trend value for that series. 
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We shall not give a numerical illustration of the foregoing, since most 
series, including by-product and beehive coke, are available on a monthly 
basis, and, when data are already known for eleven months of a year, an 
estimate of the annual total for that series based only on the annual total 
for another series can be of little use. It should be clear that the pro- 
cedure assumes a continuation of the relationship existing between the 
two sets of fluctuations, and also a continuation of the two trend lines. 

TABLE 22.6 

Correlation of Percentages of Trend of Production of Pig Iron and Production 
of Steel Ingots and Steel for Castings, 1946—1952 


Year 

Pig iron 

X 

Steel ingots 
and steel 
for castings 
F 

XF 

X8 

y2 

1946 

88.5 

90.2 

7,982.70 

7,832.25 

8,136,04 

1947 

109.2 

108.3 

11,826.36 

11,924.64 

11,728.89 

1948 

106.8 

106.7 

11,395.56 

11,406.24 

11,384.89 

1949 

90.6 

89.0 

8,063.40 

8,208.36 

7,921.00 

1950 

104.5 

105.0 

10,972.50 

10,920.25 

11,025.00 

1951 

108.9 

108.7 

11,837.43 

11,859.21 

11,815.69 

1952 

i 91.2 

91.9 

8,381.28 

8,317.44 

8,445.6t 

Total 

699.7 

699.8 

70,459.23 

70,468.39 

70,457.12 


The per-cent-of-trend figures were obtained from the production data of Table 22.5, using the trends 
shown in Chart 22.6. 


yZXF - (SZ)(SF) 

^ - (SX)2][ArSF2 - (2Y)^ 

^ 7(70,459.23) - (699.7) (699.8) 

” V[7(70, 468.39) - (699.7)2] [7 (70,457. 12) (699.8)®] 

= +0.994. 

Correlation of fluctuations when data have been divided by s. In Chapter 
16 it was pointed out that time series having different amplitudes of fluc- 
tuation are easier to compare graphically if each set of adjusted data is 
divided by its standard deviation. When two series^ of deviations have 
been expressed in terms of their respective standard deviations, the 
product-moment formula for the correlation coefficient becomes 

NsxSy N \Sx sj 

Thus we obtain r by merely (1) multiplying the paired values, (2) add- 
ing, and (3) dividing by N. (Note that Sx — and Sy = Sy^ since add- 


2 The series may be chronological or non-chronological For example, two sets of 
paired grades expressed as deviations from their means and in terms of their standard 
deviations (sometimes called standard scores) may be correlated as shown in Table 22.7* 
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ing, or subtracting, a constant does not alter the value of s for a series 
of values.) The data of by-product coke and beehive coke produc- 
tion provide an -excellent illustration, since it is apparent in Chart 22.4 
that the fluctuations in beehive coke production are more pronounced, 
in terms of percentages^ of trend, than the fluctuations in by-product 
coke production. In fact, for 11 of the 12 years shown in Chart 22.4, 


TABLE 22.7 


Correlation of Percentage Deviations from Trend Expressed in Terms of s 
for By-product Coke and Beehive Coke^ 1941--1952 


Year 

By-product coke 

1 Beehive coke 

a; y 

- X~ 

1 Sx By 

X 

i 

X 

Sx 

y 

y2 

L 

Sif 

1941 

- 3.57 

12.7449 

-0.541 

- 8.02 

64.3204 

-0.384 

+ 0.207744 

1942 

4- 1.76 

3.0976 

+0.268 

416.76 

280.8976 

4-0.803 

4 0.215204 

1943 

+ 3.17 

10.0489 

-f 0.482 

415,23 

231.9529 

4-0.730 

+ 0.361860 

1944 

4* 7.55 

57.0025 

4-1.148 

+ 4.35 

18.9225 

4-0.208 

+ 0.238784 

1945 

- 1.32 

1.7424 

-0.201 

-19.53 

381.4209 

1 -0.936 

+ 0.188136 

1946 

-15.07 

227.1049 

-2.292 

-27.23 

741.4729 

-1.305 

+ 2.991060 

1947 

4- 4.20 

17.6400 

40.639 

410.07 

101.4049 

40.482 

+ 0.307998 

1948 

+ 5.64 

31.8096 

1 4-0.858 

412.00 

144.0000 

4*0.575 

+ 0.493350 

1949 

- 7.65 

58.5225 

! -1.163 

-39.78 

1,582.4484 

-1.906 

+ 2.216078 

1950 

4* 1.69 

2.8561 

1 4-0.257 

-h 6.55 

42.9025 

4-0.314 

+ 0.080698 

1951 

4- 8.50 

72.2500 

4-1.293 

i 4-39.43 

1,554.7249 

4-1.889 

+ 2.442477 

1952 

- 4.91 

24.1081 

-0.747 

i - 9.15 

83.7225 

-0.438 

+ 0.327186 

Total 


518.9275 



5,228.1904 


+ 10.001175 


The X and y values are the values in the last columns of Table 22.2 and Table 22.3 
expressed as deviations from 100.00. The sum of the percentage deviations from a 
trend line is ordinarily not exactly zero. However, if the trend has been fitted by 
least squares to data covering the same period as the data under consideration, the 
discrepancy ma 3 ^ be expected to be so slight that it may be ignored. Including the 


(f)' 

decimal place for s„ and Sy. 


correction factors 


and 


(f) 


below does not alter the figures in the third 


^ /518.0275 

\ ^ \ 6.570. 

. . /57228‘i904 

i ^ (f ' ^) "'k (+10.061175) = -1-0.838. 


the beehive coke per-cent-of-trend values are farther removed from 
the 100 line than are the by-product values. Furthermore, six of 
the beehive coke fluctuations exceed the largest fluctuation shown 
by by-product coke. In Table 22.7 the two series are expressed as 
percentage deviations from trend, and the necessary computations 
are made for the determination of the standard deviations. Below 
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the table it is seen that the standard deviation for b37--product 
coke, is 6.576, and that Sy, the standard deviation for beehive coke, is 

These two sets of 


20.873. 


X y 

Table 22.7 also shows the — and — values. 


values are shown, as time series, in Charts 22.7. What has been accom- 
plished by dividing each series by its standard deviation may be seen by 
comparing Charts 22.7 and 22.4. If a scatter plot were to be drawn of 


STANDARD 

DEVSATIONS 



Chart 22.T. Production of By-product Coke and of Beelxive Coke, Expressed 
as Percentage Deviations from Trend and in Terms of Their Standard Devia- 
tions, 1941-1952. Data from Table 22.7, 


X V 

the — and “ values, it 'would be exactly the same as Chart except that 

Sx 

the scales would differ. Table 22.7 shows the computation of r for the 
— and — values, and it is found to be +0.838, identical with the value 
obtained in Table 22.4. 

Correlation of unadjusted data with time as a third variable. 
Another procedure for correlating the fluctuations of two time series con- 
sists of determining the partial correlation existing between the two series 
when time is held constant. The partial correlation coefficient which is 
computed is rn.^, where Xi and Jfj are the two time series and Xz repre- 
sents the years, which, for convenience, are taken with the origin in the 
middle of the period. Table 22.8 shows the sums necessary for deter- 
mining ri 2 , ris, and r^z and, from these, ru.,. Note that ail of the totals 
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TABLE 22.8 

Computations for Partial and Multiple Correlation of Prod iiC Lion of Beehive 
Coke, Xi 9 Production of By-product Coke;, Xs? and Time^ 1941-1952 


(Production figures are in thousands of short tons.) 


Year 

Beehive 

Xi 

By j 
product 
X2 : 

Time 

X* 

XiXt 

XtXa 

1 

XsXt 

A1 

V| 

1941 

6,704 

58,482 1 

-11 

392,063,328 

-73,744 

-643,302 

44,043,610 

3,420,144,324 

1942 

8,274 

62 , 295 ! 

- 9 

515,428,830 

-74,466 

— 560 , 055 

68.459,070 

3,880,667.025 

1943 

7,933 

63,743 

- 7 

505,673,219 

-55.531 

-446,201 

62,932,489 

4.063,170,049 

1944 

6,973 

67,065 

- 5 

467,644.245 

-34,865 

-335,325 

48,022,729 

4,497,714,225 

1945 

5,214 

62,094 

- 3 

323,758,116 

-15,642 

-186,282 

27,185,796 

3,855,664,836 

1946 

4,568 

63,929 

- 1 

246,347,672 

- 4,568 

- 53,929 

20,860,624 

2,908,337,041 

1947 

6,687 

66,759 

1 

446,417,433 

6.6S7 

66,759 

44,715,969 

4,456.764,081 

1948 

6,578 

68,284 

3 

449,172,152 

19,734 

204,852 

43,270,084 

4,662,704,656 

1949 

3,415 

60,222 

5 

205,668,130 

17,075 

301,110 

11,662,225 

3,626,68<>,284 

1950 

5,827 

66,891 

7 

389,773,857 

40,789 

468,237 

33,953,929 

4,474,405.88! 

1961 

7,343 

71,990 

9 

528,622,570 

66,087 

647,910 

53,919,649 

5,182.560,100 

1952 

4,601 

63,631 

11 

292.766,231 

50,611 

699,941 

21,169,201 

4,048,904,161 

Total 

74,117 

765,385 

0 

4,763,325,783 

-57,833 

163,115 

481,701,387 

49.077,725,663 


Data from sources given below Table 22, h 


^Xl « 2(286) « 572. 


ri2 


Tli 


fu 


jysxA - (SXi)(2Zi) 

VliVSA^ - (SZi)“l[iV2J^_- (SXs)®] 

12(4,763,325,783) - (74, 117) (765,385) 

\/[12(481,701,387) - (74, 117)2][12(49, 077, 725,663) - (765,385)=] 

= +0.456428. 


NUXiX, - (2;Zi)(SZ.) 
VliVSX? - (SXx)»][iVS.Xf - (SXa)^] 


12( -57,833) - (74,117)(0) 

-\/[12(481, 701,387) - (74,117)»H12(572) - (O)^] 


-0.494381 


jyzZiZs - (SZ2)(SX3) 

VlATSZi - (2Z0“][ZSZt - (SZ,)»] 

12(163,115) - (765,385)(0) 

V[12(49,077, 725,663) - (765,385)‘i][12(572) - (0)^] 


+0.423071. 


Vi - »•» Vi - >•?, 

+0.456428 - (-0.494381) (0.423071) 

-as yrr===rrz=:-r::::,^:::=: « 4*0.845. 

VI - (0.494381)^ Vi - (0.423071)® 

shown in Table 22.8 could have been obtained from Tables 22.1, 22.2, 
and 22.3. From the computations shown below Table 22.8, we find 
ris.* = +0.845. 

If it were desired to express the relationship existing between the three 
variables by means of a multiple estimating equation, such as was used 
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in Chapter 21, and if beehive coke were the dependent® variable Xi, we 
would use the equation type 

Xcl.23 = ^1.23 + hl2.Z^2 + &I 3 . 2 X 3 , 

where, as in Table 22.8, X2 refers to the prpducfion of by-product coke 
and X3 is time, with the origin for X3 bet%veen 1946 and 1947 and the X3 
units one year. If such an equation is used to estimate an annual figure 
for one series from a more promptly available figure for another series, it 
assumes the continuation of the straight-line trends for both series and a 
continuation of the same relationship between the fluctuations of the two 
series. 

It is of more than passing interest that the partial and multiple correla- 
tion analysis set forth in Table 22.8 is exactly the same as if we were to 
correlate the amounts of deviation from the trends in Tables 22.2 and 22.3. 
To demonstrate this, Table 22.9 has been made, which shows the absolute 
deviations from trend for by-product coke and for beehive coke. Below 
Table 22.9 it is seen that, when the absolute deviations from trend are 
correlated, r == +0.845, the same value obtained for ri2.s iu Table 22.8. 

Since the multiple and partial correlation procedure produces the same . 
results as correlating absolute differences from trend, the former pro- 
cedure is subject to the same disadvantage as the latter. This dis- 
advantage was noted on pages 365-367, where it was pointed out that 
relative deviations from trend are usually more meaningful than absolute 
deviations from trend. The fact that value of r obtained for the absolute 
deviations from trend is slightly larger than that for the percentages of 
trend should not be construed as an argument in favor of using absolute 
deviations from trend. One or a few large absolute deviations, would 
have a marked effect on the value of r, as noted in Chapter 19 (see Charts 
19.9 and 19.10 and accompanying discussion). 

Correlation of amounts of change or percentages of change. 
Occasionally, the relationship between the fluctuations of two time series 
may be studied by computing the amount of change from each year to the 
following year for both series and then correlating the paired amounts of 
change, which will have positive and negative values. This procedure is 
not recommended since: (1) using amounts of change results in the loss of 
one pair of values and (2) if the trend is non-linear, the first differences 
of values fluctuating aroimd that trend will still contain a trend element. 

® If by-product coke were the dependent variable, the equation would be 
Xojj.ia ” U3.ia 4" bssL.sXi 4" haa.iXa, 

or the identification of variables Xt and Xa could be interchanged and the equation 
given above could be used, 
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This trend element could even be in the opposite direction to the original 
trend. 

Alternatively, percentages of change may be computed for each of the 
two series and the paired percentages may be correlated. Here again, we 
would have one fewer pair of values than the number of years involved. 
Also, the percentages of trend would still contain an element of trend if 
tlie trend for a series were not an exponential curve (page 291). 

Note that in both of these procedures different functions of the basic 
data than those previously discussed would he correlated. 

TABLE 22.9 

Correlation of Ahaolute Deviations from Trend of Production of By-product 
Cohe and of Beehive Coke, J94J-~I9S2 


(Thousands of shoit tons.) 


Year 

By- 

product 

X 

Beehive 

r 

xr 


yz 

1941 

-2,163.3 

- 584.6 

1,264,665.18 

4,679,866.89 

341,757.16 

1942 

+1,079.4 

+1,187.6 

1 1,281,895.44 

1,165,104.36 

1,410,393.76 

1943 

+1,957.1 

+1,048.8 

2,052,606.48 

3,830,240.41 

1,099,981.44 

1944 

+4,708.8 

+ 291.0 

1,370,260.80 

22,172,797.44 

84,681.00 

■ 1945 

- 832.6 

-1,265,7 

1,053,821.82 

693,222.76 

1,601,996.49 

1946 

-9,567.9 

-1,709.5 

16,356,325.05 

91,544,710.41 

2,922,390.25 

1947 i 

+2,691.8 

+ 611.7 

1,646,574.06 

7; 245, 787. 24 

374,176.89 

1948 

+3,646.4 

+ 704.9 

2,570,347.36 

13,296,232.96 

496,884.01 

1949 

-4,985.9 

-2,255.9 

11,247.691.81 

24,859,198.81 

5,089,084.81 

1950 

+1,112.8 

+ 358.3 

398,716.241 

1,238,323.84 

128,378.89 

1951 

+5,641.4 

+2,076.5 

11,714,367.10 

31,825,393.96 

4,311,852.25 

1952 ; 

-3,287.9 

- 463.2 

1,522,955.28 

10,810,286.41 

214,554.24 

Total 

+ 0.1 

0.1 

52,480,226.62 

213,301,165.49 

18,076,131.19 


The deviations wore obtained from the production and trend data of Tables 22.2 and 22,3. 


• NZXF - (2X)(SF) 

12(52,480,226.62) - (Q.l)(-0.1} 

^,105.49) - (0.i)^][12(18,076,rSLm ^ 

* +0.845. 

Problems in correlating time series. It must be evident that the 
value of the correlation coefficient is affected by the type of trend fitted to 
the data, and by the period to which it is fitted. If a period of 10 years 
is being correlated, it would not be logical to use for one series a section 
of a trend fitted over a 100-year period and for the other a trend fitted to 
data extending over 10 years only. The former trend would, in all likeli- 
hood, fail to pass through the approximate center of each cycle, and might 
not even touch some of the cycles. Consequently, the correlation coeffi- 
cient might understate or overstate the degree of relationship between the 
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cycles of the two series. It must also be apparent that the use of an 
inflexible trend for one series and a flexible trend for the other would 
produce similar results. If we wish to correlate cyclical movements, it 
seems best therefore to use a trend that goes approximately through the 
center of each cycle. It may be that no simple mathematical curve will 
be satisfactory and that a relatively subjective method may have to be 
resorted to, at least as a first approximation. 

Another problem to consider is whether the Pearsonian method of cor- 
relation, based on the second moments, is appropriate for correlating time 
series. The fluctuations of a time series are not usually distributed 
normally around the trend line. There are sometimes a few extreme 
deviations, which, when squared, largely determine the value of r. With 
this problem in mind, some authorities suggest the use of the rank method 
when the extreme deviations are particularly large. Another solution is 
the use of a formula based on first moments, rather than second/ In 
view of the fact that interest frequently centers in whether two series are 
moving in the same general direction (positive or negative) at the same 
time, without regard to the magnitude either of their level or of their 
change, it may be that a method applicable to 2 X 2 tables (see pages 
480-482) would be appropriate. 

A further difficulty in correlating time series is that we have no logical 
basis for estimating the reliability of the coefficient of correlation. The 
chief objection to the use of any reliability test for r for time series is that 
the different observations are not randomly distributed — each observation 
in a time series is related to values in that series for preceding and subse- 
quent points of time. Furthermore, we cannot ordinarily generalize 
concerning the exact nature of this interrelationship. Perhaps this 
difficulty will become more obvious when we ask how many independent 
observations are contained in the cyclical relatives used in Table 22.7. 


^ See ^^The Validity of Correlation in Time Sequences and a New Coefficient of 
Similarity,^' by O. Gressens and E. D. Mouzon, Jr,, Journal of the American Statistical 
Association, VoL XXII, December 1927, pp. 483-492. This method is further eluci- 
dated and its relation to r explained by George R. Davies, in an article entitled First 
Moment Correlation,'^ appearing in the Journal of the American Statistical Association, 
VoL XXV, December 1930, pp. 413-427. The formula is 

^ Xs(2N - Sjsl) 

where s refers to the smaller of each pair of items when each series is expressed as 

deviations from the mean in terms of average deviations When 

summing algebraically, s is positive if the signs of the paired deviations are alike, and 
negative if they are unlike. 
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Although there are 12 years, there are not 12 independent observations. 
There are only three complete cycles (measuring from trough to trough). 
Are there, then, only three independent observations? There are more 
than three, since each observation in a cycle is not completely dependent 
on the preceding valu^^s. If we now had monthly data, would we have 
144 independent observations for the 12 years? Of course not. But 
how many we would have, it is impossible to say. What has just been 
said may be clearer when the reader understands the concept of degrees 
of freedom/^ This is discussed in Chapter 24 and again, with particular 
reference to correlation, in Chapter 26. 

All of the preceding illustrations have dealt with chronological series 
expressed in physical terms. None were in monetary units. When a 
series is in terms of dollars, it should ordinarily be adjusted for price 
changes by dividing by an appropriate price index. Such a situation is 
encountered when we examine the relationship existing between the price 
and production of an agricultural crop such as oats, hay, wheat, or citrus 
fruits. The correlation present may be between price and production 
for the same years or between price for each year and production for the 
following year. 

The foregoing discussion has dealt only with correlation of two time 
series, although it was mentioned at the outset that we might correlate 
two or more time series. If one is undertaking to explain, statistically, 
the annual fluctuations in the price of pork, he would undoubtedly bring 
into his analysis not only the production of pork, but the price and pro- 
duction of corn, and probably the price and production of beef and other 
meats. A problem of this type is more complicated than those which 
we have considered here, since multiple correlation of several variables is 
involved. Plowevcr, the procedures are exactly those set forth for multi- 
ple and partial correlation in Chapter 21. Whatever the number of 
variables being considered, appropriate adjustment must be made for the 
trend of each scries. 

MONTHLY DATA 

When correlating monthly time scries, it is necessary, not only to 
adjust for trend, but to deseasonalize the data as well. If the data were 
not deseasonalized, we would be, to a large extent, merely correlating the 
seavsonal fluctuations instead of the cyclical movements. In addition, 
it is also usually desirable to smooth the adjusted data by means of a 
short-term moving average (as explained in Chapter 16) in order to 
remove the irregularities due to accidental movements, 

Synehroiioos relationships. Sometimes one is interested in cor- 
relating two monthly time series in order to ascertain whether the two 
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move together. Thus, such a correlation might be made if two organiza- 
tions issue indexes purporting to measure the same aspect of economic 
activity. Or, a research bureau may be interested in knowing whether 
an index of business conditions, computed upon the basis of a few com- 
ponent series, agrees closely enough in its depicting* of cyclical movements 
with a more comprehensive index which is also more expensive to con- 
struct. Again, one may be interested in comparing time series (for 
example, department store sales) for two, or more, of the twelve Federal 
Reserve districts. 

Lag and lead. Frequently one is interested in finding a monthly 
time series which moves ahead of a second series and which may therefore 
be used to forecast the second series. The relationship which one hopes 



*37 *39 »4l ’43 *45 ’47 ’49 ’51 ’53 


Chart 22.8. Two Illustrative Series Showing One Series Regularly 
Preceding the Other. 

to find is something like the ideal one illustrated in Chart 22.8, although 
the cycles would almost never have the regularity shown in this chart . 
In Chart 22.8, the forecasting index is seen to move, regularly, ahead of 
the series to be forecasted. When such a situation obtains, the earlier 
moving series (that is, the forecasting index) is said to ^Tead” the other 
series. Also, the later-moving series is said to ^Tag^^ the earlier-moving 
series. One will very rarely find a lag-lead relationship as uniform as that 
depicted in Chart 22,8. In fact, since 1941, lagging relationships between 
economic time series have not been at all clear-cut, owing first to World 
War 11 and then to the Korean War and to defense production. How- 
ever, some lagging relationships do appear, as indicated by the following 
statement from the Thirty-Third Annual Report of the National Bureau 
of Economic Research:^ Recently, we have laid plans for exploring how 
one of the firmest and most important of the Bureau’s findings about the 

® A. F. Burns, Business Cycle Research and the Needs of Our Times, 33rd Annual 
Report of the National Bureau of Economic Research, Inc., New York, 1053, p. 12. 
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business cycle might be put to current use; namely, that the cycle in 
aggregate activity has been invariably preceded by a remarkably regular 
cycle in the proportion of individual activities undergoing expansion/' 
Chart 22.9 shows the Federal Reserve Index of Production of Durable 
Manufactures and th"e Federal Reserve Index of Nondurable Manu- 
factures for the period January 1946~December 1953. These indexes 
were adjusted for seasonal movements by the Federal Reserve Board, 
The writers removed trend and smoothed the accidental movements by 
means of a three-month moving average weighted 1,2,1. The actual 



)94$ 1947 t94S 1949 1950 1951 1952 1953 


Chart 22.9. Cyclical Movements of Federal Reserve Index of Production of 
Burable Manufactures and of Index of Production of Nondurable Manufac- 
tures, 1946~195S. Data from Table 22.10 and from %vorksheets (not shown) for the 
years omitted from that table. Both indexes were adjusted for trend and for sea- 
sonal and irregular movements, and were expressed as percentage deviations. 

situation depicted in Chart 22.9 is much different from the illustrative 
one shown in Chart 22.8, where one series regularly preceded the other. 
Examination of Chart 22.9 reveals several interesting points: the low 
points in the Index of Nondurable Manufactures appear to precede 
similar low points in the Index of Durable Manufactures in 1947, 1948, 

1949, and 1952; the high point in the Index of Nondurable Manufactures 
in 1946 seems to precede by some months a high in the other index; in 

1950, the high point for the Index of Durable Manufactures precedes the 
high for the Index of Nondurable Manufactures. 

In general, the Index of Nondurable Manufactures seems to precede 
the other index. We shall compute several correlation coefficients to 
ascertain when the closest agreement is present. First, correlating the 
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two series synchronously, we find r = +0.682. Next, pairing the values 
with the Index of Nondurable Manufactures leading the Index of 
Durable Manufactures by one month, we obtain r = +0.715. (Here 
the pairing starts out with January 1946 for the Index of Nondurable 
Manufactures paired with February 1946 for^the Index of Durable Manu- 
factures and finishes with November 1953 for the leading series paired 
with December 1953 for the lagging series.) Since the lag between the 
two series is none too clear in Chart 22.9, we try a pairing with the Index 
of Durable Manufactures leading by one month. This yields r = 
+0.637, which is lower than the value first obtained, so we will not pursue 
the illustration further in this direction. 

Trying, now, two months^ lead for the Index of Nondurable Manu- 
factures, the computations for which are indicated in Table 22.10, we 
obtain r == +0.728, which is larger than the coefficient for a one-month 
lead of that index. (Chart 22.10 shows the two indexes, with the Index 
of Durable Manufactures moved two months to the left.) Next, we 
compute the correlation coefficient with the Index of Nondurable Manu- 
factures leading three months, and get r = +0.688, which is smaller than 
the value just obtained for two months^ lead. Little is to be gained by. 
computing additional values of r for the purposes of this illustration, so 
we will summarize the results, as follows: 


Leading series Value of r 

Index of Durable Manufactures leads by: 

one month -f-0.637 

Synchronous + 0 . 682 

Index of Nondurable Manufactures leads by: 

One month . . 4-0 . 715 

Two months 4-0.728 

Three months 4-0 . 688 


The highest correlation coefficient was found when the Index of Non- 
durable Manufactures led by two months. Ho%vever, that index would 
not serve as a very satisfactory forecasting series for the Index of Durable 
Manufactures, because the value of r does not indicate close enough 
agreement. 

It is not always necessary for one time series to lead another one in 
order for it to be useful as an indicator of the behavior of the second series. 
The Bureau of Business and Economic Research of the University of 
Maryland reports® that Baltimore bank debits are correlated +0.9998 
with Maryland bank debits, and that Maryland bank debits are correlated 

® University of Maryland, Bureau of Business and Economic Research, Studim 
in Business and Economics ^ VoL 6, No. 3, December 1952, Maryland Economic 
Indices/* p. 10. 
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-+'0.9853 with bank debits in the United States. The Bureau notes that 
‘'turns in direction of the Baltimore series may be expected to indicate 
turns in the State and the Nation.^' The usefulness of this relationship 
lies in the fact that data for Baltimore would be available more promptly 
than are data for MarJ^land or for the United States. 

TABLE 22.10 

Determination of Correlation Between Federal Reserve Index of Nondurable 
Manufactures and Index of Durable Manufactures^ January 1946- 
December 1953^ with the Index of Nondurable Manufactures Lead- 
ing by Two Months 


(Both indexes have 1947--1949 as the base, are adjusted for seasonal, trend, and irregular movements, 
and are expressed as percentage deviations.) 


Year 

and 

month 

Index of 
Nondurable 
Manufactures 
X 

Indication 

of 

pairing 

Index of 
Durable 
Manu- 
factures 

F 

XY 



1946: Jan. 

+0.1 


— 


-16.6 



0.01 


Feb. 

+ 1 9 




-20.5 



3 61 


IMar. 

+ 1.6 



! > 

-12.0 


1.20 

2.56 

144.00 

Apr. 

+0.3 


— 

> 

- 6.9 


13.11 

0 091 

47.61 

IVIay 

-0.8 




- 8.1 


12 96 

0 64i 

65.61 

June 

-2.2 




- 4.7 


1.41 

4.48i 

22.09 

July 

-3.0 




- 0.2 

+ 

O.IC 

9.00^ 

0.04 

Aug. 

-2.0 




+ 2.3 


6,06 

4.00 

5 29 

Sept. 

-0 7 




+ 4.6 


13,80 

0.49 

21.16 

Oct. 

+0.8 




+ 58 


11,60 

0.64 

33.64 

Nov- 

+2.6 



1 

+ 5.2 

i — 

3.64 

6.76 

27.04 

Dec. 

+3.1 




+ 4.8 

Ll 

3.84 

9 61 

23.04 


1953:" July 

+2 0 


+ 

+ 

14. dc 

4.00 

14.44 

Aug. 

+0.4 i 


+ 2.5 

+ 

7.50 

0.16 

6.25 

Sept, i 

-1.1 i 

i 


+ 0.1 

+ 

0.20 

1.21 

0.01 

Oct. 

-2.2 i 




- 2.4 


0.96 

4,84 

5.76 

Nov. 

-3.6 




- 5.4 

+ 

6.94 

... 

29.16 

Deo. 

-5,3 


[ 

- 8.4 

+ 

18.48 


70.56 

Total 

3.3 

1 1 

48.4 

^ + 1,554.461 

972.87 

4,707.88 


Deseasonalized data from Federal Reserve Bulletin, December 1053, pp. 1326-1327, and mimeographed 
releases. 

N'SXY - (SZ)(SF) 

” ViVSX* - {SX)2](iVS7* - (SF)2]’ 

= 94(1,554.46) - (3.3)(48.4) ^ 

V[94(972.87) - (3.3)*][94(4,707.8S) - (48.4)“] 


Procedure for use of lead and lag as an aid in forecasting. If it 
is desired to make use of a lead-lag relationship to assist in forecasting the 
cyclical movements of a series (the lagging series), the procedure may be 
as follows: 
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1. Plot the lagging series on a large sheet of semi-transparent cross- 
section paper. The exploratory work in this and the following three 
steps may be done with data adjusted for seasonal. Trend (unless it is 
ver}^ marked) and irregular movements need not be removed, although it 
is better if they have been eliminated. 

2. Consider what series may logically be expected to precede the lag- 
ging series, and plot each of these series on a separate sheet of semi- 
transparent graph paper. The horizontal scales used in Steps 1 and 2 



1946 1947 1948 1949 1950 1951 1952 1953 

o 

Chart 22.10. Cyclical Movements of Federal Keserve Index of Production 
of Durable Manufactures and of Index of Production "of Nondurable Manu- 
factures, 1946-1953, with Index of Durable Manufactures Moved Two Months 
to the Left. Data horn Table 22.10 and from worksheets (not shown) for the years 
omitted from that table. Both series were adjusted for trend and for seasonal and 
irregular movements, and were expressed as percentage deviations. 

must be the same. The vertical scales may be adjusted so that the 
fluctuations of the series which are to be compared are roughly the same. 

3. Place the chart of one of the presumably leading series on top of the 
chart for the lagging series (or vice versa), place both above a source of 
light, and move the chart of the lagging series to the left until the closest 
agreement between the cyclical movements of the two series is obtained. 
Chart 22.10 shows how this might appear. If closer agreement is 
obtained by moving the leading series to the left, then it doesn’t lead — 
it lags! 

4. Repeat Step 3 for any other series which might move ahead of the 
series for which forecasts are desired, 

5. When a series has been found that appears regularly to precede the 
lagging series, adjust both series for trend and irregular movements and 
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compute the value of r for the best visual estimate of the lead shown by 
the graphs of these adjusted series. 

6. Compute the values of r for longer and shorter leads than that used 
in Step 5 in order to arrive at the highest value of f. This was two 
months in the preceding illustration. 

7. If the value of r is high enough to warrant doing so, an estimating 
equation of the type 

F, == a + bX, 

or possibly a non-linear equation, may be computed. Here, Fc is the 
estimated cyclical value for the lagging series and X is the observed 
cyclical value of the leading series. If the probing in Steps 3 and 4 should 
reveal more than one leading series, a forecasting equation such as those 
for multiple correlation (Chapter 21) would be used. 

One investment advisory service^ has been using multiple correlation, 
with one independent variable leading by a year, to obtain a rating for 
stocks. In this analysis, the dependent variable is the average annual 
price of a stock, while the independent variables are: annual dividends 
per share, annual earnings per share, the average monthly price of the 
•stock for the preceding year, a measure of market climate^’ or sentiment, 
and time. Market climate itself is obtained by a process of multiple 
correlation and represents the difference, over a long period of time, 
between a composite stock price average and estimates of that average 
based on earnings, dividends, and time. 

The slowness with which most economic and business data are reported 
and the scarcity of time series on a basis shorter than a month are factors 
that impair the usefulness of correlation as a forecasting device. It is 
quite possible that weekly, daily, or hourly data might bring to light 
relationships which are known and utilized only by a few ^^insiders.^^ 
The theorist argues that all economic processes are interrelated. It does 
not seem logical that the cause-and-effect relationships which supposedly 
surround us on every side must always take a month or more for their 
development. There must be many that work out in a few days, a few 
hour»s, or nearly instantaneously. If the market hears that a new indus- 
trial use hm suddenly been announced for copper, it does not wait weeks 
or even hours to show its reaction in a price change. As data are made 
available upon a weekly, daily, or more frequent basis, it is conceivable 
that very useful lag-lead relationships may be obtained. 

Some cautions. It may have been noticed that the heading of the 
preceding section referred to the use of lead and lag as an aid in fore- 


^'The Valve Line Investment Survey, 5 East 44th Street, New York City. 
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casting. A leading correlation that has been observed for a number of 
years in the past will not be applicable to future months unless the rela- 
tionship between the series continues as before. If underlying economic 
(or other) conditions change, the relationship may be altered. Fore- 
casting by this, or any other, device should be attempted only in connec- 
tion with a thorough knowledge of the series under consideration and of 
the conditions affecting those and allied series. 

The use of lead-lag correlations in forecasting is also subject to other 
objections or shortcomings. Among these are: 

1. As pointed out in Chapter 19, the value of r may be unduly influ- 
enced by one or a few extreme values. Some statisticians even argue 
that one^s visual impression of the amount of lead is preferable. 

2. The lag may be different at recession from what it is at revival. 

3. Interest often centers mainly on turning points, while r gives equal 
importance to leads and lags at all phases of the cycle. It may be 
profitable to be able to foretell merely when to expect a change in direc- 
tion, even though the amount of change cannot be forecast. 

4. It is a laborious process to compute r for a large number of lead-lag 
hypotheses. 

6. In addition to criticisms of the coefficient of correlation as a measure 
of relationship for time series, one may also criticize the nature of the 
variations correlated, arguing that a person can more accurately predict 
the future with respect to the present than he can with respect to some 
normal, which is often difficult to estimate correctly. 

In Chapter 26, attention will be given to the reliability of correlation 
coefficients computed from random samples. Since the coefficients 
obtained from lead-lag relationships are not for random samples, the 
procedures in Chapter 26 are not applicable to the correlation coefficients 
for leading and lagging series. 





Symbols Used in Chapter 23 


A : when tossing a die, the occurrence of a white side. *4 has no numerical 
value. 

az: lower-case Greek alpha, a measure of skewness, V^i. See Chapter 

10 . 

B: when tossing a die, the non-occurrence of a white side. B has no 
numerical value, 

0h ^2* lower-case Greek beta; respectively, measures of skewness and 
kurtosis. See Chapter 10. 

c: a correction for skewness sometimes used in fitting a logarithmic normal 
curve. 

Co, Cl, C2, * • * : the binomial coefficients. 

d': deviation, in terms of class intervals, of an -Y value from Xd. 

El 2.71828; the limit of the series 1 + 1 + “ + “ + ;™ + • • • 

2! o! 4! 

/ : a frequency. 


in fitting the second-approximation curve, the normal-curve 


areas of Appendix E. 


F 2 : in fitting the second-approximation curve, the tabled values of 

Appendix F which, when multiplied by 0^3, give the modification for 
skewness. 

h: in coin tossing, the occurrence of a head. 
i: the class interval. 


k: the number of samples. 

N: the number of items in a sample. 

Ph lower-case Greek nu; the first, second, and third moments about 
a selected origin. See Chapter 10. 

: the proportion of occurrences in a sample. 

: lower-case Greek pi, in the expression for the normal curve; the con- 
stant 3.14159; in the binomial, the proportion of occurrences in a 
population. 

7r2, Ta: lower-case Greek pi; the second and third movements about X. 
See Chapter 10. 

g; the proportion of non-oceurrences in a sample. 

sm 
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Q: the quartile deviation or semi-interqnartiie range. See Chapter 10. 

Qh Qz, Q&: the qiiartiles. See Chapter 9. 

s: the standard deviation of a sample. See Chapter 10. ^ 

Slog: the standard deviation of the logarithms of a series of sample values. 
Skiog: a coefficient of skewness based on the logarithms of the quartiles. 
a: lower-case Greek sigma. The standard deviation of a population. 
a: the estimated standard deviation of a population, computed from a 
single sample. Referred to as ‘‘sigma careC^ or “sigma hat.’’ See 
Chapter 24. 

i: in coin tossing, the occurrence of a tail or the non-occurrence of a head. 
t: lower-case Greek tau; the proportion of non-occurrences in a popula- 
tion. 

.r: X - X, 

X: a value of the X-series. 

X: the arithmetic mean. See Chapter 9. 

Xd: a designated mean. See Chapter 9. 

Xiog: the arithmetic mean of a series of logarithms, 
xia^: log X - Xiog. 

Fc! a computed ordinate of a fitted curve. 

Fo: the computed ordinate of the normal curve at X. 

proportionate area under a curve from X to X. 



CHAPTER 23 


Describing a Frequency Distribution 
by a Fitted Curve 


A frequency distribution usually represents a sample drawn from a 
much larger population or universe. Even though a sample is composed 
of but a few hundred or a few score items, it may be reasonably repre- 
sentative of the larger universe from which it 'was drawn. Since it is 
virtually never possible to measure all of the individuals or items com- 
prising a universe, we must form our notion of the larger group from a 
study of a sample. We may therefore fit any one of a number of types of 
curves to a frequency distribution in order to attempt to describe what 
appears to be the general form of the curve for the entire population. 

The purpose in fitting a curve to a frequency distribution may be any 
one of the following: 

(1) We may 'wish to ascertain whether a given curve describes the 
general shape of the distribution. For example, we may wish to demon- 
strate that the chance errors involved when making repeated measure- 
ments of the same object or phenomenon may be described by a normal 
curve. Chart 23,1 is a normal curve and Chart 23.2 shows such a curve 
fitted to a series of repeated measurements. 

(2) Somewhat similar to the foregoing is the fitting of a curve to values 
obtained from repeated samples taken from the same population. An 
illustration of this is included as Exercises XV and XVI in the third 
edition of the Workbook* designed to accompany this text. In those 
exercises, a normal curve is fitted to a frequency distribution of arithmetic 
means computed from random samples. While sample arithmetic means 
tend to form a normal curve around the arithmetic mean of the popula- 
tion, other statistical values may form other types of curves. Further 

E. Croxton, Workbook in Applied Gmerul StatisHcSf Third Edition. Prentice- 
Hall, lac., New York, 1950. 
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consideration will be given to the behavior of values computed from 
samples in Chapters 24 , 25, and 26. 

(3) It may be desired to generalize concerning the proportions of items 
which should be expected to fall above, below, or between certain values. 
For example, we may take the case of fitting a curve to a frequency dis- 
tribution of the length of life of incandescent lamp bulbs; from such a 
procedure we are enabled to infer what proportion might, in general, be 
expected to burn 1,500 hours or more (or more or less than any specified 



Chart 23*1. The Normal Curve, 


number of hours). Similarly, in the case of the data shown in Charts 
23.5 and 23.6, we may determine the number of items which in general 
would be expected to occur above, below, or between any two X values. 
In like fashion, the life insurance actuary may fit a curve to, or graduate 
data having to do with, deaths classified by age and thus determine the 
expected number of individuals d 3 dng during each year of life or surviving 
given ages. 

(4) Sometimes it is possible to determine, from a curve fitted to a given 
distribution, the probable distribution of values in a closely associated 
series. For example, a normal curve fitted to the measurements of the 
circumferences of men^s necks enables us to ascertain the probable number 
of collars of each size which would be needed. This has been done in 
Chart 23.8 and Table 23.5. 

This chapter will not attempt a comprehensive treatment of the topic 
of fitting frequency curves. We shall consider only the symmetrical 
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curve known as the normal curve, and then, briefly, binomials and two 
of the simpler skewed curves. 

THE NORMAL CURVE 

Development of thes. normal curve. The concept of the normal 
curve (pictured in Chart 23.i)*appears to have been originally developed 
by Abraham De Moivre and explained in 1733 in a mathematical treatise^ 

NUMBER OF 
MEASUREMENTS 



2lT9.f 

LENGTH !N FEET 

Chart 23.2, Normal Curve Fitted to 144 Measurements of the 
Length of a Line, Measurements from L. D. Weld, Theory of Errors and 
Least Squares, p. 147, The Macmillan Company, New York, 1916. 

which its author believed had no practical applications other than as a 
solution of problems encountered in games of chance. Gauss later used 
the curve to describe the theory of accidental errors of measurements 
involved in the calculation of orbits of heavenly bodies. Because of 
Gauss^ work, this curve is sometimes referred to as the Gamstan curve. 
Chart 23.2 shows a column diagram of 144 measurements of a line^ and 


^ Approxiniaiw ad Summam Terminonan Bimmii (a -f in Seriem expansi, Nov, 
12, 1733, being a second supplement to Miscellanea Analyiica, 1730. See Karl Pear* 
son, Historical Note on the Origin of the Normal Curve of Errors, Biometrika, Vol. 16 
(1924), pp. 402--404; also, Helen M, Walker, Studies in the History of Statistical Method, 
pp. 13-17, 22-23, Williams and Wilkins, Baltimore, 1929. 

®The 144 measurements are from L. D. Weld, Theory of Errors and Least Squares, 
p. 147, The IVfacmillan Company, New York, 1916. 




Chap. 23] 


FITTED FREQUENCY CURVES 


591 


a normal curve of error fitted to these measurements. Concerniiig the 
normal curve, it will be observed: (1) that small errors are more frequent 
than large ones, (2) that very large errors are unlikely to^ occur, and (3) 
that positive and negative errors of the same numerical magnitude are 



Chari 23.3. Apparatus to Illuslrale the Expansion of the Binomial 

(I “h 1)- 

equally likely to occur. Because the normal curve has been used exten- 
sively to describe errors of measurement, it is sometimes referred to as the 
^^normal curve of error.” However, this term is misleading, since errors 
of measurement, even though unbiased, do not always follow the normal 
curve.® 

Explanation of the formula. Chart 23.3 pictures an apparatus 
which will help us to understand the formula for the normal curve. The 
device consists of a number of troughs, open at one end and placed as 

® Sec N. R, Campbell, Accoinil of the Principles of Meusnremeni and Calculation} 
Oh. IX, especially p. 182, note 1 , Longmans, Green & Co., London, 1928. 
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shown in section A of Chart 23.3. Trough d is filled with sand or some 
similar granular substance. If the apparatus is tipped so that the left- 

hand side rises (section B of Chart 
23.3), the sand in trough d will flow i 
into trough y and | into trough fc. This 
represents the binomial (i + i) . If the 
right-hand side of the machine is then 
raised (section C of Chart 23.3), the 
sand from j will flow i into c and 1 into 
d, while the sand from k will flow ^ 
into d and i int'o e. Of the total 
amount of sand, we now have i in c, i 
in d, and i in e, representing the expan- 
sion of the binomial (i + iy. Again 
tipping the device, as in section D of 
Chart 23.3, i of the sand from c flows 
into ij and i into j; i of the sand from d 
flows into j, and ^ into k ; and i of the 
sand from e flows into fc, and i into Z. The result is that i of all the sand 
is in f is in f is in fc, and i is in Z, representing the expansion of the 
binomial + I)®. Tipping the apparatus as in section E of Chart 23.3 
causes the sand to flow into 
6, A into Cy A hito d, into e, 
and xV into /, representing the 
expansion of (i + i)^. Once 
more tipping the machine (sec- 
tion F of Chart 23.3) results in 
putting of the sand into A, 
into if into if into fc, 
into Z, and into w, which is 
the expansion of (i + i)^* 

While the above illustration 
is instructive and gives us a 
picture of the expanded bino- 
mial, the device would become 
clumsy if we attempted to carry 
the expansion of the binomial 
much farther. We may obtain 
similar results by tossing coins 
— a procedure which eliminates 
the necessity of constructing any apparatus. It is assumed that we are 
tossing perfect coins which are evenly balanced and which will not stand 


OCCURRCNCeS 



+ih^- 

Chart 23.4B. Expected Results of 
Tosses of Six Coins. 


OCCURRENCES 



Chart 23.4A. Expected Results 
of lOjOOO Tosses of Four Coins. 
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OCCURRewCES 
2500 : 


on edge. With such a coin, the chances of throwing a tail or a head are 
identical and may be expressed by 4* 

If two coins are tossed simultaneously, we may obtain either no heads 
(two tails), a tail and a head, or two heads. In order for no heads to 
appear, both coins must fall tails up. To obtain one head, one coin may 
show a tail and the other a head, or the first coin may show a head, the 
other a tail. Two heads may appear only if both coins show heads. 
Since one head may occur in two ways, while no heads may occur in but 
one way, it follows that there 
is twice as great a proba- 
bility of throwing one head 
as of throwing no heads. 

Similarly, there is twice as 
great a chance of throwing 
one head as there is of throw- 
ing two heads. We may ex- 
press the probabilities aris- 
ing from tossing two coins- 
by 

(ii + ^hy, 



in which the exponent 2 
indicates the number of coins 
being tossed. Expanding 
this binomial gives 


102C ^ 1024^^ ^ 1024^ ^ ^ 1024'^ ^ ^ ^ 


1024 “' 


4. 4. hm 4. 

^ 1024'^ ^ ^ m 4 r ^ ^ 


4. 8/2 4 JO A 9 / 

1024" ^ ^ 1024^ ^ ^ 1024^^ ^ 


^ 1024^^ 


Chart 23.4C. Expected Results of 10,000 
Tosses of Ten Coins. The probability of each 
combination is indicated by the binomial expan- 
sion shown under each part of Chart 23.4. 


+ ^th + ih^. 

Therefore, if two perfect 
coins are thrown 1,200 times, we could expect to obtain (no heads) 300 
times, th (one head) 600 times, and (two heads) 300 times. 

If three coins are tossed, we have the expression 


+ ih)^ == + §th^ + 


indicating that, if 1,200 throws were made, there should be no heads 150 
times, one head 450 times, two heads 450 times, and three heads 150 times. 

The results to be expected from tossing 4 coins are shown in section A of 
Chart 23.4, while the results to be expected from tossing 6 and 10 coins are 
shown, respectively, in parts B and C. All of these curves are symmetri- 
cal, and, as the number of coins tossed becomes greater, the curve becomes 
smoother. When ten coins are tossed, there are eleven points to be 
plotted (see part C) ; but if 100 coins were tossed, there would be 101 
points to plot and the curve would appear virtually the same as that of 
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Chart 23.1. In fact, it can be shown^ that, as N approaches infinity, 
iU + approaches as a limit 


~xs 



which is the expression for the normal curve. The symbols are as follows : 

Yc — the computed height of an ordinate at distance x from the 
arithmetic mean; 

cr == the standard deviation of the population; 

T — the constant, 3.14159; V^27r - 2.5066; 
e == the constant, 2.71828, the base of the Naperian system of 
logarithms ; and 

X — SL selected deviation from the arithmetic mean. 


Substituting the two constants mentioned above, 
equation 


F. 


1 

2.5066cr 


2 . 718282 -^ 


we may write the 


FITTING THE NORMAL CURVE 

In Chart 23.2 a normal curve was shown fitted to a series of measure- 
ments of a line. It will be observed that those figures were repeated 
measurements of the same thing. In Chart 23.5 we have a different type 
of data, representing measurements of a number of individuals from a 
homogeneous population. The chance errors involved in repeated 
measurements of the same thing not infrequently follow a normal curve. 
However, the measurements of a number of differential individuals in 
respect to some characteristic may or may not follow such a curve. A 
distribution of the heights of a homogeneous group of adult individuals, 
for example, could be expected to be essentially normal, but a distribution 
of the weights of the same individuals would be noticeably skewed to the 
right. While the basal diameter of the egg-capsules of the snails in Chart 
23.5 may be described by the fitted normal curve, it is quite likely that 
the weights of these same eggs would show definite skewness. 

TJie fitted curve in Chart 23.5 indicates the shape of the distribution 
we should expect if our sample were much larger, or if we had measured 

^See G. U. Yule and M. O. Kendal!, An Introduction to the Theory of Etntistics, 
Hafner PublisMng Co., New York, 1950, pp. 177-“ 181. 

Another limit of the binomial is the Poisson distribution, which the binomial 
approaches if one of the fractions is very small and N approaches infinity. Fitting 
the Poisson distribution is described in F, B. Croxton, Elementary Statistics with 
AppUcatiom in Medicine^ Frentice-Hali, Inc., New York, 1953, pp. 201-206. 
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the entire population. It implies that, if a larger group were studied, we 
should find a few instances wdth basal diameters both smaller and larger 
than those found in the sample. 

Fitting the normal curve to data of physical ability. Table 
23.1 shows a distribution of the distances which 303 high school freshman 
girls were able to throw a baseball. These data are akin to those from 
which Chart 23.5 was drawn in that they are measurements for a number 

NUMBER OF 
EQ6- CAPSULES 



Claart 23.5. Normal Curve Fitted to Basal Diameters of 99 Egg» 
Capsules of a Marine Snail, Sipho curtus. Data of basal diameters 
from Gunnar Thorson, Studies on the Egg-Capsules and Development of Arctic 
Marine FrosobranchSf p. 7, Meddelelser om Gr0nland-udgione af-I\ommis- 
sionen for Videnskabelige Enders0gelser i Gr0iiIand. 


of different individuals. It may be observed that very few of the girls 
threw the baseball less than 45 feet and very few threw it 115 feet or 
farther. The column diagram of Chart 23.6 shows the data of Table 23- L 
To fit a normal curve to an observed frequency distribution, we rewrite 
the equation 


F. - 


m 

2.50663 


2.71828^^*, 


where N is the number of observations in the sample, 

i is the class interval of the sample distribution, and 
s is the standard deviation of the sample. 


We could use S* 



an estimate of cr, which is discussed in the 


following chapter, instead of s when fitting a normal curve to a set of 
observed data. However, %ve ordinarily prefer a, since it measures the 
dispersion of a sample of the observed size, rather than being an estimate 
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of the dispersion in Iho population. Furthermore, for a frequency dis- 
tribiition having a large enough .¥ to ^sairrant the fit of a normal curve, 
the difference^ between $ and & is so slight as to have little or no effect 
on tlic fit. For the data of Tabic 23.1, for example, s = 20.95 feet and 
fr = 20.98 feet. 

The cornpicto fitting process consists of two steps; first, the cleterniiiia*- 
fcion of the values of a number of ordinates in order to ascertain the exact 


TABLE 23.1 

Ba.*tehaU Thrown for Bistonce 
by S()3 First^Year High 
School Ciris 


Distance in feet 

Number 

of 

girls 

15 but under 25 

I 

25 but under 36 

2 

35 but under 45 

7 

45 but under 55 

25 

55 but under 65 

33 

65 but under 75 

53 

75 but under 86 

64 

85 but under 95 

44 

95 but under 105 

SI 

105 but under 116 

27 

115 but under 125 

11 

125 but under 135 

4 

135 but under 145 

1 

Total. 

303 


Data horn X»eoiiora W. Stewart and 
Helen West, The Froebe! School, Gary, 
Indiana. Measurements were made in 1935. 


outline of the fitted curve, and, second, the computation of the propor- 
tionate areas for the portions of the curve that are important to us. 
Ordinates. Referring again to the formula for the normal curve, 


Fc- 


m 

2.5066s 


-3r« 

2.71828 2*’ 


it appears that we need the values of N, X, and s in order to fit a normal 
curve to a distribution. Computing by procedures described in preced- 
ing chapters, we find Z =* 80.63 feet and s = 20.95 feet. As there were 
303 girls, N = 303. 

We shall first compute the ordinate to be erected at the mean. This is 
designated as Fo and is the maximum ordinate of the fitted curve. Since 
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z — 0 at the mean, we have 


3 03 X 10 

2.5066 X 20.95 ' 


-01 


2.7i828"^20.95)^ 


In the expression above, the exponent of 2.71828 is zero. Since a number 

-pa 

raised to the zero power is one, 2.71828“^'"®**'^^^' — 1. It is apparent, then, 

— a;* 

that the expression is always equal to i for the ordinate erected at 
the mean and 

M 

Y ^ 

° 2.506CS 


Therefore, 




jVt 

2.5066s 


Hi! 


Z£l 

Fo 2.71828 


For the problem in hand, 

y = 303 X 10 ^ 

“ 2.5066 X 20.95 

Wo now wish to erect enough additional ordinates on either side of Fo to 
enable us to sketch a reasonably smooth curve. If we select successive 
distances of 4.19 feet from the mean, we shall erect ordinates at steps of 
is from the mean. The first pair of ordinates (since the curve is sym- 
metrical) are to be erected at z = ±4.19 feet from the mean (X = 84.82 
and 76.44 feet), using the expression 

Yc - 57.7 X 


In order to determine the value Fc, it is not necessary to compute 

-( 4 . 19)3 

2.71828^^^^*®^^* but merely to refer to Appendix D. Looking up the appro- 
z 4.19 

priate value of which in this case is = 0.20, we find that 
s 20.95 ' 


and 


-( 4 . 19)3 

2.718282<2®-9®>’ - 0J8020 
Yc =- 57.7 X 0.98020 - 58.6. 


For the next pair of ordinates, z = ±8.38 feet (X = 89.01 feet and 72.25 
feet) and 


-( 8 . 33)8 

Yc ^ 57.7 X 2.718282^^«-®^)l 
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TABLE 23.2 

Be termination of Ordinates of Normal Curve Fitted to Data of 
Basebail^Throws for Distance by First^Year High School Girls. 


(X = 80.C3 feet; s =« 20.95 fee^, Yo =- 57.7) 


z 

(in feet, where 
ordinates are 
to be erected) 

(1) 

X 

(in feet, 
deviation^ 
of X from Y) 

(2) 

z 

s 

(3) 

Proportionate 
height of 
ordinate 
-£2 

2.71828 
(Appendix D) 
(4) 

Height of 
ordinate 
[Col. 4 X Fo] 

(5) 

13.59 

-07.04 

3.20 

0.00598 

0,3 

17.78 

-62.85 

3.00 

0.01111 

0.6 

21,97 

-58.66 

2.80 

0.01984 

1.1 

26.16 

-54.47 

2.60 

0.03405 

2.0 

30.35 

-50.28 

2.40 

0.05614 

3.2 

34.54 

-46.09 

2.20 

0.08892 

5.1 

38.73 

-41.90 

2.00 

0.13534 

7.8 

42.92 

-37.71 

1.80 

0.19790 

11.4 

47.11 

-33.52 

1.60 

0.27804 

16.0 

51.30 

-29.33 

1.40 

0.37531 

21.7 

55.49 ! 

-25.14 

1.20 

0.48675 

28.1 

59.68 

-20.95 

1.00 

0.60653 I 

35.0 

63.87 

-16.76 

0.80 

0.72615 

41.9 

68.06 

-12,57 

0.60 

0.83527 

48.2 

72.25 

- 8.38 

0.40 

0.92312 

53.3 

76.44 

- 4.19 

0.20 

0.98020 

56.6 

80.63 

0 

0 

1.00000 

57.7 

84.82 

+ 4.19 

0.20 

0.98020 

56.6 

89.01 

+ 8.38 

0.40 

0.92312 

53.3 

93 20 

+ 12.57 

0.60 

0.83527 

48.2 

97.39 

+16.76 

0.80 

0.72616 

41.9 

101.58 

+20,95 

1.00 

0.60653 

35.0 

105.77 

+25.14 

1.20 

0.48675 

28.1 

109.96 

+29.33 

1.40 

0.37531 

21.7 

114,15 

+33.52 

1.60 

0.27804 

16.0 

118.34 

+37.71 

1.80 

0.19790 

11.4 

122.53 

+41.90 

2.00 

0.13534 

7.8 

126.72 

+46.09 

2.20 

0.08892 

5.1 

130.91 

+50.28 

2.40 

0.05614 

3.2 

135.10 

+54.47 

2.60 

0.03405 

2,0 

139.29 

+58.66 

2.80 

0.01984 

1.1 

143.48 

+62.85 

3.00 

0.01111 

0.6 

147.67 

+67.04 

3.20 

0.00598 

0.3 


Here the ratio of - is 0.40 
s 


and, referring to Appendix D, we have 


Yc = 57.7 X 0.92312 == 53.3. 


The process of determining the heights of the ordinates can be handled 
most expeditiously by use of a table similar to Table 23.2. The ordinates 
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in the upper and lower parts of the table are identical, since the'fitted 

curve is symmetricaL 

The fitted curve is shown in Chart 23.6. It follows the general shape of 
the sample, but smooths out the irregularities and indicates what might 
be expected if the performance of a very large number of comparable girls 
could be recorded. What we have done so*far gives merely the shape of 
the fitted curve and a visual impression of the suitability of the fit, which 
appears good in this instance. 

Areas. We have not yet undertaken to say what proportion of high 
school freshman girls may be expected to throw a baseball: (1) any 

flUMBER 
OF GIRLS 



Chart 23.6. Normal Curve Fitted to Data of Baseball Throws for 
Distance by Firs t~ Year High School Girls. Data from Tables 23.1 
and 23.2. 


Specified number of feet or more, (2) any specified number of feet or less, 
or (3) a distance equal to or greater than one specified value but equal to 
or less than another larger value. Neither have we attempted to say 
what proportion of girls may be expected to fall into each of the various 
classes of the frequency distribution. Expected frequencies are ascer- 
tained by integrating the fitted curve. However, the procedure is 
greatly simplified, and no knowdedge of integration is needed, if we make 
use of a table of the areas under the normal curve such as Appendix E. 
This appendix gives the proportionate area under the curve which is 

— z 

between an ordinate at X and an ordinate at specified - distances in 

s 

either direction (not both directions) from X. This statement is illus- 
trated by the small chart shown with Appendix E. The largest proper- 
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tionat/e area shown in Appendix E is 0.50, since the area under the entire 
curve is LO. 

To ascertain the proportion of girls that may be expected to throw a 
baseball 100 feet or more, we first determine the proportion that may be 
expected between the values of X = 80.63 feet and X = 100 feet and 
then subtract this proportion from 0.50. At X 100 feet, x — 100 — 
80.63 = 19.37 feet, and, since s = 20.95, 


X _ 19.37 
s 20.95 


0.92. 


Referring to Appendix E, it appears that 0.3212 of the area is between the 
two values, and therefore 0.50 — 0.3212 - 0.1788, or about 18 per cent, 
of the area is at or beyond X = 100 feet. 

If we wish to know what proportion of girls may be expected to throw 
a baseball 50 feet or less, the procedure parallels that just given. The 
reader should work this out for himself. The answer is 7.2 per cent. 

We can avoid the subtractions involved in the two preceding paragraphs 
if we refer to Appendix G, "which shows areas in one tail of the normal 
curve. This appendix and Appendix H, which gives areas in two tails 
of the normal curve, will be particularly useful in connection with part 
of the subject matter of Chapter 24. 

To determine the proportion of girls who may be expected to throw a 
baseball between 87 and 100 feet, we compute the area under the curve 
from X = 80.63 feet to X — 87 feet, and the area from X = 80.63 feet to 
100 feet, and then take the difference between these two figures. The 
first proportionate area is obtained by using 


X = 6.37 feet and 


X _ 6.37 
5 "" 20.95 


0.30. 


Appendix E shows that 0.1179 of the area is between X = 80.63 feet and 
X = 87 feet. We already know that 0.3212 of the area is between 
X == 80.63 feet and X = 100 feet, so the proportionate area between 
87 feet and 100 feet is 


0.3212 — 0.1179 — 0.2033, or about 20 per cent. 

Referring to Table 23.3, the expected frequencies in each class of the 
frequency distribution are obtained as follows: 


1. In Column (1) of the table, enter the classes of the original distri- 
bution, allowing for one or two additional classes at each end, since the 
fitted curve should usually have a greater range than the sample. Theo- 
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frequencies. This is of impoTtance in making the x® test of Table 25 . 10 . 
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reticalPy the fitted curve is of unlimited range in both directions. Allow 
two spaces for the class in which the mean falls. 

2. In Column (2), write the lower limits of each class below the mean 
in value and the lower limit of the class which contains the mean. 

3. In Column (3), write the upper limit of each class above the mean 
in value and the upper limit of the class which i?icludcs the mean. 


0.1064 0.0832 * 0.1896' OF THE 

AREA FROM 75 F£ET TO 85 FEET 





0.1064 OF THE AREA FROM. ^ 

80.63 FEET TO 75 FEET 

0.0832 OF THE AREA FROM./ 

80.63 FEET TO 85 FEET J 

1 1 

y fiy-* 

V\ 

t 

0.2549 - 0.0832 = 0.1717 OF 

THE AREA FROM 85 FEET TO 95 
k FEET 

/ 

1 

1 


[A 

0,2549 OF THE AREA FROM 

/ 

1 

1 



=80.63 FEET TO 95 FEET 

/ 

» 

1 

1 

1 

/ 


\ 0.3770 - 0.2549 = 0.1221 

A- OP THE AREA FROM 95 
' \FEET to 105 FEET 

/ 

1 

J 

; 

/ 

\ 0.3770 OF THE AREA 
FROM 80.63 FEET 
X. TO 105 FEET 

Jr 

I'""' 


4 V 


1 

J 

1 



♦ 

^ 



76 I 66 95 105 

X 

80.63 

DISTANCE IN FEET 

CJbart 23*7. Graphic Representation of the Procedure in Columns (6) and 

(7) of Table 23.3. 

4. We shall ascertain first the proportionate area between the mean 
(80.63 feet) and the upper limit (85 feet) of the class in which the mean 
falls. The deviation of the upper limit from the mean is 4.37 feet; this 
value is entered in Column (4), Since s = 20.95 feet, 


X _ 4.37 
a 20.95 


0 . 21 . 


This value is entered in Column (5). Now, looking up 0.21 in Appendix 
E, we find that 0.0832 of the area is betw^een the mean and 85 feet. This 
value is entered in Column f6). The procedure is shown graphically in 
Chart 23.7. 

5. The next step consists of determining the proportionate area between 
the mean and the upper limit of the first class above the mean. This 
limit is 95 feet; x — 14.37 feet and 


X 

s 


14.37 

20.95 


= 0.69. 
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Looking up 0.69 in Appendix E shows that 0.2549 of the area woilld be 
expected to be between the mean and 95 feet. This value is entered in 
Column (6). If 0.2549 of the area is found between 80.63 and 95 feet, 
while 0.0832 of the area occurs between 80.63 and 85 feet, there would 
be 0.2549 — 0.0832 = 0.1717 of the area between 85 and 95 feet. The 
result of this subtraction is entered in Column (7) ; this procedure is also 
indicated graphically in Chart 23.7, 

6. The procedure in Step 5 is repeated for each class above the mean in 
value. The proportionate areas from the mean to the upper limit of 
each class are ascertained, and then the proportions from the mean to the 
upper limit of the preceding class are subtracted, as shown in the table. 

7. The proportionate areas between the mean and the lower limits 
shown in Column (2) of the table are next determined. Since these 
areas are also cumulative, successive subtraction is again necessary. 

8. We now have entered in Column (7) the proportionate areas for 
each class except the class containing the mean. We have determined, 
in Column (6), that 0.0832 of the area is between the mean and 85 feet, 
and that there is 0.1064 of the area between the mean and 75 feet. Add- 
ing these two figures gives 0.1896, the proportion of the area in this class 
[see Column (7) and Chart 23.7]. 

9. The total of Column (7) should be 1.0000, as there is 0.5000 of the 
area from the mean to either extreme of the distribution. In order to see 
the agreement between the observed and the expected frequencies, we 
include Column (8), which is obtained by multiplying 303 by the pro- 
portionate area of each class. 

A comparison of the expected frequencies, showm in Column (8) of 
Table 23.3, with the observed frequencies of Table 23.1 reveals a general 
agreement of the figures, the difference being greatest for the class ‘‘85 
but under 95 feet.^^ A test of the “goodness of fit^’ of the normal curve 
will be described in Chapter 25. 

The normal curve and collar sizes. To illustrate another use of the 
normal curve, let us assume that a maker of collars is considering the 
production of a collar styled especially for college men. Consideration 
will, of course, be given to the number of collars of each size wMch should 
be made. Since college men represent a selected group, it would be 
desirable to adjust the manufacturing schedule to their particular require- 
ments. Extensive data on the circumference of the necks of college men 
are not available, but Table 23.4 shows the neck measurements of 231 
male college students. Tofit a normal cuxwe, we need Y = 14.232 inches 
and s = 0.719 inches. The column diagram of the observed data and the 
fitted curve are shown in Chart 23.8. 

Our problem, in this instance, is not to determine the expected propor- 
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NUMBER OF 
STUDENTS 



Chart 23.8. Normal Curve Filled to Neck Circumference of 231 
Male College Students. Based on data of Table 23.4. 


tion of college men having necks ^^12.75 but under 13.25'^ inches in cir- 
cumference, “13.25 but under' 13.75” inches in circumference, and so 
forth, but rather to determine the number of collars of each size (by half 
sizes) which should be made. Experience shows that, on the average, 
collars are worn about f of an inch larger than the circumference of the 
neck. This means that collars size 14 would be worn by men whose necks 
averaged 13.25 inches, and, since we are dealing with half sizes, the necks 
would range from 13 to 13.5 inches in circumference. The first column 
of Table 23,5 lists the collar sizes, while the second column shows the 
corresponding neck circumferences. It is for these classes that we need 

TABLE 23.4 


Neck Circumference af 
231 Male College 
Students 


Mid-vaiues 
(in inches) 

Number of 
students 

12.5 

4 

13.0 

19 

13.5 

30 

14.0 

63 

14.5 

66 

15.0 

29 

15.5 

18 

16.0 

1 

16.5 

1 

Total 

231 


Source of data eonfideBtial. 




TABLE 23.S 

Detertnination of Expected Distribution of Collar Sisesfor Male College Students 
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to asc?ertain the theoretical frequencies. This is done in the remainder 
of the columns, and the expected frequencies (N = 1,000) are shown in 
Column (9). If our basic data are representative, there would be about 
270 customers in a thousand calling for size 15 collars, 221 asking for size 
Hi, 213 requesting size-15i, and so on. It is interesting to observe that 
we might expect only 8 out of a thousand of this group to ask for size 13 
or smaller and but 7 out of a thousand to require 17 or larger. 

Suitability of the normal curve. As previously pointed out, the 
normal curve is only one of a number of kinds of curves which may be 

TABLE 23.6 

Cumulative Distribution of Baseball 
Throws for Distance by 303 Firsts 
Year High School Girls 


Distance in feet 

Number 
of girls 

Per cent 
of total 

Less than 25 

1 

0.33 

Less than 35 

3 

0.99 

Less than 45 

10 

3.30 

Less than 55 

35 

11.56 

Less than 65 

68 

22.44 

Less than 75 

121 

39.93 

Less than 85 

185 

61.06 

Less than 95 

229 

75.58 

Less than 105 

260 

85 81 

Less than 115 

287 ! 

94.72 

Less than 125 

298 i 

98.35 

Less than 135 

302 

99.67 

Less than 145 

t 303 

100.00 


Cumulative data of Table 23 I. 


fitted to a frequency distribution. It should in no sense be thought of as 
a form having general applicability to all distributions. Since this is 
true, what guides are there which will tell us when to fit a normal curve, 
or, when fitted, if it is suitable? 

1. The plotted curve or column diagram of the sample distribution 
serves as a very crude guide. If there is marked skewness present, it will 
be apparent, as will also any irregularities. 

2. The sample data may be cumulated and put into percentage form, 
as in Table 23.6; these cumulative percentages may then be plotted on 
arithmetic probability paper, ^ as in Chart 23.9. If the resulting curve is 
approximately a straight line, w^e may proceed with assurance to fit a 
normal curve. 


® The vertical scale is so designed that the ogive of a normal curve will appear as a 
straight line. 
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3. The values of and 132 may be computed as described in Oliapter 
10, and, by methods which are set forth in Chapter 26, we may ascertain 
whether differs significantly from zero and whether ^2 differs signifi- 
cantly from 3.0. For the throws of a baseball by higli school freshman 


and ^2 = 2.7724. 
the value for a 


girls, |3i = 0.0104 
significantl}’^ from 
normal curve. 

4. After the curve has been fitted 
and the expected frequencies have 
been determined for the various 
classes, a test of goodness of fit^^ 
may be made. This test is described 
in Chapter 25, and indicates that 
the fit of the normal curve to the 
data of baseball throws by girls is 
satisfactory. 


Neither* of these values differs 


PER CENT 
OF GIRLS 
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BINOMIALS 

It was previously shown that the 
expansion of a symmetrical binomial 
(i + iy be approximated ex- 
perimentally by tossing coins. An 
asymmetrical binomial may be ex- 
panded experimentally in a similar 
fashion. 

Experimental construction of 
skewed binomials. Let us con- 
sider, first, a single die, four sides of 
which are colored black. If we toss 
this die, it is apparent that the prob- 
ability (t) of having a white side 
come up is 1 out of 3, or i, while the 

probability (r - 1 - tt) of obtaining a black side is 2 out of 3, 

Using A (which has no numerical value) to indicate the occurrence of a 
white side and B (which also has no numerical value) to indicate the non- 
occurrence of a white side, that is, the occurrence of a black side, we may 
express the situation as 

rJS 4" ttA or 


Chart 23.9. Cumulative Distri- 
bution of Baseball Throws for Dis- 
tance by 303 First- Year High School 
Girls, Shown on Arithmetic Proba- 
bility Paper. Based on data of 
Table 23.6. 


or 


wMeh indicates that, if the die (assumed to be perfectly balanced) is 
tossed 1,500 times, we should expect a black side to appear 1,000 times 
and a white side 500 times. 
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If, now, we toss two dice (each having four black sides), there may 
appear either no white faces (2 black faces), one white face (a white face 
and a black face), or two white faces. The expression is 

(IS + Uy - + iBA + lAl 

Therefore, if 1,800 throws are made, we should expect to obtain no white 
faces 800 times, one white face 800 times, and two white faces 200 times. 


OCCURRENCES IN THOUSANDS 



Chart 23*10. Expected Results of 59,049 Throws of 10 Bice, Each Hav- 
ing Four Black Sides and Two White Sides. The expected occurrences are 

given by Qb + 


59,049^ ^ 69,04r^ ^ 59,049^ ^ ^ 59,049"^ ^ + 59,049^ ^ ^ S9,049"^ ^ 


5,120 


3,360 
+• 59,049 




If three such dice are thrown, the expression is 

(15 + Uy - -irB^ + + -ABA^ + ^A\ 

It will be observed that the binomial is beginning to show its skewed 
nature* This will be more clearly seen if we consider throwing ten dice, 
each with four black sides. The expression is (fS + ^A) which is 
shown graphically in Chart 23.10. The curve is definitely skewed as a 
result of the fact that r and ir are unequal. 
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If r is a larger fraction and tt is smaller, tlie skewness will be even 
greater. Let ns consider as an illustration a four- 
sided pyramidal die with one white side and three 
black sides. It will be necessary to consider the 
“down’^ side as the one obtained at a throw. For 
throwing one die, the expression is iB + lA. 

If 10 of these four-sided dice are thrown, their 
behavior is indicated by (fR + tA) The expansion 
of this binomial is shown in Chart 23.11, which is 
noticeably more skewed than the curve of Chart 23.10. 

Fitting a MnomiaL It is apparent from the expression for a 
binomial that it is a device most useful for fitting to discrete data. In 
order to fit a binomial to a series of observed data, the following three 



A Four-Sided 
Die, Eacli Side 
of Wliicli Is an 
Equilateral Tri- 
angle. 


OCCURREMCES IN THOUSANDS 



Cliart 2S,11. Expected Results of 1,048,576 Throws of 10 Foor-Sided 
nice. Each Having Three Black Sides and One White Side. The expected 

occurrences are given by (|s + 

+ 

+ 1 , 048 , 576 ^ + 1 , 048 , 576 "^ ’• 


steps are necessary: (1) Determine the proper value of tt, which also gives 
us T , since t = 1 — tt. The size of v determines the degree of skewness 
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of the curve. If t = 0.50, then r = 0.50 and the curve is symmetrical 
The *farther removed t is from 0.50, in either direction, the greater the 
skewness. If tt < 0.50, the curve is positively skewed; if tt > 0.50, it is 
negatively skewed. When population values (ir and r) are not known, 
or when a reasonable assumption concerning them cannot be made, we 
have no alternative But tp employ proportions determined from the 
sample. These we call p and q, (2) Expand the. binomial (r + tt)^ or 
(q + where N — the number of categories minus one, since there are 

TABLE 23.7 


Number of Male Pigs Born 
In Litters of Five 


Number 
of males 

Number of 
i litters having 
specified 
number of males 

0 

2 

1 

20 

2 

41 

3 

35 

4 

14 

5 

4 

Total 

116 


Data from A. S. Parkes, “Studies 
on the Sex-Batio and Related Phe- 
nomena, The Frequencies of Sex 
Combinations in Pig Litters,” Sto- 
metrika, Vol. 16 (1923), pp, 373-381. 
Parkes fits a binomial to the same 
series using p ~ 0.4S76, as deter- 
nuned for litters of 4 to 12 pigs. His 
expected frequencies are identical 
with ours. 


N "1' 1 terms in the expanded binomial N is also the number of items 
in a sample. (3.) Multiply each of the fractions of the expanded binomial 
by kj the number of samples. 

Ta!)le 23.7 shows a distribution of the number of male pigs occurring in 
litters of five pigs. The data are for 116 such litters; so W = 5 and 
k “ 116. Altogether there are 5 X 116 — 580 pigs of both sexes and 
(0 X 2) + (1 X 20) + (2 X 41) + (3 X 35) + (4 X 14) + (5 X 4) - 
283 male pigs. The proportion of male pigs, p, is therefore 


2 ^ 

580 


- 0.4879 


and q - 0.5121. 

As pointed out above, the fitting is accomplished by expanding + pY, 



TABLE 2S.8 

Binifmml h(q + p)^ Fitted to Distribution of Number of Male Pigs Born in Litters of Five 
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Substituting 5 for N, but retaining the other symbols, we have 

k{q + |})5 fc(g® + 6g^ + lOg^^ + + p®), 

where the exponent of p indicates the number of males born in a litter of 5. 

The numerical expression to use in fitting the binomial is (0.5121 + 
0.4879)^ and, since k == 116, we should expand 116(0.5121 + 0.4879)^ 


NUMBER 
OF LITTERS 



CJiart 23.12. Binomial Fitted to Distribution of Number of Male 
Pigs Born in Litters of Five. Data from Tables 23.7 abd 23.8. 

This becomes 

116[(0.5121)« + 6(0.5121)^0.4879) + 10(0.5121) ^0.4879)^ 

+■ 10(0.5121)2(0.4879)3 + 5(0.5121) (0.4879) ^ + (0.4879)^]. 

The computations are most conveniently carried out by means of loga- 
rithms, as shown in Table 23.8. Although the powers could be obtained 
and the multiplications could be performed for this problem by the use 
of a calculating machine, the use of logarithms is essential when a bino- 
mial is raised to an appreciably higher power. 

Chart 23.12 shows the observed and the expected frequencies. The 
observed data have been presented by means of separated bars to suggest 
the discrete nature of the series. A test of ^‘goodness of fit,^^ similar to 
that described in Chapter 25, indicates good agreement between the 
observed and expected frequencies. 
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It should not be assumed that all discrete series may be fitted by* the 
method just explained. Some data are better described by other distri- 
butions, as, for example, the Poisson, the fitting of which is described 
elsewhere by one of the writers,® 

SKEWED CURVED 

The binomials just discussed are suitable for fitting to discrete data, but 
are not accurate enough to use with continuous data. A fitted binomial 
consists of a series of ordinates erected at specific points on the X-axis 
(see Chart 23.12). If this procedure were applied to a distribution of con- 
tinuous data (or to discrete data where the X units are small in relation to 

DUMBER 
OF HOI^ES 



KILOW/ffT HOURS 

Chart 23,13. Logarithmic Normal Curve Fitted to Kilowatt Hours 
of Electricity Used per Month in 282 Medium-Class Homes in an East- 
ern City, Based on data of Table 23.9. 

the class interval), we should be erecting an ordinate at the mid-value of 
each class, instead of determining the area under a smooth curve. Obvi- 
ously, the greater the number of classes, the less would be the difference 
between these two procedures. 

There are a great many types of skewed curves which may be fitted to 
frequency distributions. It is the purpose of this volume, not to enter 
into an extended consideration of this topic, but merely to sketch briefly 
the procedure involved in fitting two of the simpler types.*^ 

The logarithmic normal eurve. Some distributions which are 
shelved to the right become symmetrical when plotted in terms of the 

® See the reference given at the end of note 4, 

^ For a more detailed discussion, see: W, P. Elderton, Frequmcy Curvm and Correlor 
Urn, Cambridge University Press, Cambridge, England, 1953 (4th edition); H. L. 
Rietz, Mathematical StatiMicSy Open Court Publishing Co., CHcago, 1927; Arne 
Fisher, Mathematical Theory of FrohahiUties, The Macmillan Company. New York, 
1922 (2nd Edition). 
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logarithms of their X values or, alternately, when plotted on graph paper 
having a logarithmic X-scale. The column diagram of Chart 23.13 
shows the monthly use of electricity by 282 medium-class homes in an 
eastern city, drawn from the data of Table 23.9. It is apparent that the 
series is decidedly skewed ki a positive direction. In Chart 23.14 these 
mmm 

OF HOMES 



Chart 23.14. Kilowatt Hours of Electricity Used per Mouth in 282 
Medium-Class Homes in an Eastern City. Logarithmic X-scale. Data 
of Table 23.9. Frequencies are plotted at logarithmic mid-values of classes. 

TABLE 23.9 

Kilowatt Hours of Electricity 
Used per Month in Med- 
ium-Class Homes in 
an Eastern City 


'Kilowatt hours 
(mid-values) 

Number 
of homes 

10 

25 

14 

50 

18 

53 

22 

48 

26 

36 

30 

26 

34 

19 

38 

8 

42 

6 

46 

3 

50 

4 

54 ’ 

2 

58 

2 

Total 

282 


Data from Electrical Testing Labora- 
tones, New York City. Name of city 
witbbdd by recincst. 
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PER QtUT 
OF HOMES 



Ciiart 23«15. Kilowatt Hours of Electricity Used per Mouth in 
282 Medium-Class Homes in an Eastern City. Shown on logarithmic 
probability paper. Based on data of Table 23.9. 

data have been re-plotted but against a logarithmic X-scale. When the 
curve is extended to the horizontal axis at X = 6 kilowatt hours (the 
class just below the first one shown in the table), the approximate sym- 
metrical nature of the series in terms of logarithmic X values is apparent. 
A further indication of this is shown in Chart 23.15, which presents the 
cumulative percentage frequencies plotted on logarithmic probability 
paper. 
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Fitting a logarithmic normal curve* The procedure for fitting a 
logarithmic normal curve has been given by Davies® and is essentially the 
same process as that of fitting a normal curve, except that we use the 
arithmetic mean Xio« and the standard deviation Siog of the logarithms of 
the X values. The values of Ziog and Siog may be computed by making 
use of the mid-values of the logarithms of the class limits. Ideally the 
classes should be so chosen that the class intervals are equal in a loga- 
rithmic sense, thus making the logarithmic mid-values equidistant from 
each other. Usually we deal with ready-formed frequency distributions 
of arithmetically equal class intervals, and with such distributions the 
direct computation of Xiog and siog is laborious. The inconvenience of 
computing these logarithmic values has been eliminated by Davies, who 
gives formulas based upon the quartiles, which are readily computed. 
Furthermore, according to Davies, there are certain advantages to the 
procedure. He says: “Unless the data are very regular, these [Ziog and 
Slog] may be more satisfactorily computed from the quartiles, thus avoiding 
the disturbing effects of irregular extreme items.” The expressions are 
given below. 

« _ log Qi + log Qz + 1.2554 log Q 2 


This is the weighted average of the three quartiles, the weights being 
proportional to the heights of normal-curve ordinates erected at these 
values. 

Slog = 0.7413(log Qz - log Qi). 


This expression grows out of the fact that in a normal curve 50 per cent 
of the items are included within ±Q of the median (or mean), and also 
that 50 per cent of the items are included within ± 0.6745s of the mean. 
It is therefore obvious that 


Since 


it follows that 


s = 


1 

0,6745 


Q = 1.4825Q. 


Qz Qi 
2 


Q. 


Qz - Qi^ 2Q, and a - 0,7413(Q3 “ Qi)^ 

For the data of electric consumption, Qi = 15,6400 kwh., Q 2 (the 
median) - 21.0833 kwh., and Qz — 27.9444 kwh. 


* G. R. Davies and W. F. Crowder, Methods of Statistical Analysis ^ pp. 303-306; 
and G, E. Davies, Analysis of Frequency Distributions/’ Journal of the American 
Statistical Association^ Vol. 24, December 1929, pp, 349-366. 
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^log — 


log 15.6400 + log 27.9444 + 1.2554 log 21.0833 

3.2554 

1.194237 + 1.446295 + 1.2554(1.323939)' 


4.302605 


3.2554 


= 1.321682. 


3.2554 

Slag = 0.7413(log 27.9444 - log 15.6400), 

= 0.7413(1.446295 - 1.194237), 

= 0.7413(0.252058), 

= 0.186851. 

Using these two values, the expected frequencies in each class may be 
determined in a manner strictly parallel to that previously described for 
the normal curve. As before, Appendix E is used and the procedure is 
set forth in Table 23.10. 

The ordinates are computed from the expression® 


0.4343iVi 2*' 

Y ^ ALZfZlLLL 2.71828 , 

2.5066Zs..g 

which may be simplified for purposes of computation to 

""^og 


0.17326iyf 

Zsiog 


2.71828 


X is the arithmetic value of the point on the X-axis at which the ordinate 
is to be erected. The values of 2.71828 are obtained from Appendix 


* It wil! be recalled that the expression for the normal curve is 


Ni 


2.5066s 


2.71828 2®* 


For fitting the logarithmic normal curve, the expression cannot be used in this formy 
since s is in terms of logarithms while the class intervals % are equal arithmeti- 
cally. We therefore multiply i by the adjustment factor or to compen- 

sate for the fact that the intervals are not geometrically equal. We thus have 

0.4343 Ni 


2.5066eic 


• 2.71828 • 
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2/|q 

D and the — values are given by 

Slog 

^ioK "™* -^^log 

'Slog Slog 

The procedure for determining the ordinates parallels that for the normal 
curve which was shown in Table 23.2. The fitted curve is shown in 
Chart 23.13 and the correspondence between that curve and the column 
diagram is apparent. 

Davies suggests a logarithmic coefficient of skewness 


Skiog — 


log Qi + log Qs - 2 log Qa 
log Qz - log Qi 


and points out that a series which yields a coefficient of less than 0.15 
(or perhaps even 0.20) may tentatively be considered as logarithmically 
normal. If, however, a skewed distribution is not inherently logarithmic, 
Davies notes that it may sometimes be adjusted by shifting the X values 
until the desired skewness is obtained; after fitting, the X values are again 
shifted. This correction c is obtained by 


Ql - QiQz 
Qi + Qs — 2Q2 

This value is added to the clas® limits and to the quartiles, after which 
Xo* and Slog are computed. The fitting proceeds as in Table 23.10, but 
the shifted class limits are used. After the expected frequencies have 
been ascertained, the class limits are shifted back to their original values. 
It is obvious that this device extends the usefulness of the logarithmic 
normal curve. 

Fitting a normal curve with adjustment for skewness. The 
formulas previously given for the normal curve enabled us to fit a sym- 
metrical curve from a knowledge of Xj s, and We have just con- 
sidered one method of fitting a ske'^ed curve. Another procedure that 
is useful for certain skewed distributions consists of using also a measure 
of skewness and thereby making a correction to the fit of a 

normal curve. This is sometimes referred to as a second approximation 
curve. The equation^® is 




m 

2,5066s 


— 

2 . 718282 “ 


Ni 


2.6066s 


~x> 

2.718282“ 


.¥ \s “ 3ss/j)‘ 


“ The expression includes the first two terms of the Gram-Charlier series. For a 
further discussion, see W. A. Shewhart, Economic Control of Quality of Manufactured 
Product, pp. 84-94, D. Van Nostrand Company, Inc., New York, 1931. 
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TABLE 23.11 


Computation of X, s, and as for Depth of Sapwood 


Depth in 
inches 

(^mid-values) 

/ 

A 


fd' 

KdV 

f(dr 

1.0 

2 

-7 

- 14 

98 

- 686 

1.3 

29 

-6 

-174 

1,044 

- 6,264 

1.6 

i 62 

-5 

-310 

1,550 

- 7,750 

1.9 

i 106 

-4 

-424 

1,696 

- 6,784 

2.2 

153 

-3 

1 -459 

1,377 

- 4,131 

2.5 

186 

-2 

1 -372 

744 

- 1,488 

2.8 

193 

-1 

-193 

193 

- 193 

3.1 

188 

0 

0 

i ^ 

0 

3.4 

151 

1 

151 

151 

151 

3.7 

123 

2 

246 

492 

984 

4.0 

82 

3 

246 

738 : 

2,214 

4.3 

48 

4 

192 

768 

3,072 

4.6 

27 

5 

135 

675 

3,375 

4.9 

14 i 

6 

84 

504 

3,024 

5.2 

5 

7 ! 

35 

245 

1,715 

5.5 

1 

8 

8 

64 

512 

Total 

L,370 


-849 

10,339 

-12,249 


Data from W. A. Shewhart, Economic Control of Quality of Manufactured Product^ 
p. 77, D. Van Nostrand Co., New York, 1931. Courtesy of D. Van Nostrand Co., Inc. 


Vi 

V2 

Vz 


2 fd' _ 
N 

mdv 

N 

N 


-0.619708. 


= 7.546715. 


= -8.940876. 


1 = Id + ^ i = 3.1 - [(0.619708) (0.3)], 
~ 2.9141 inches. 


Since Sheppard's correction is not applied, we have 
TTs «= P2 — J'l — 7.162677. 

Tz ~ Vz — ZviP2 -f 2vl « 4.613422. 
s ^ i\/w 2 ^ 0.8029 inches. 


az 



= +0.2407. 


The expression preceding the minus sign is that for the normal curve, 
while the expression in braces represents the modification for skewness. 
In order to determine the expected frequencies, the above equation must 
be integrated. This is accomplished by the use of tables. To use these 
tables, we write 




Chap. 2S] 


FITTED FREQUENCY CURVES 


621 


where Fi represents the areas of the normal curve (given in Appendix 
^E) and azF 2 represents the modification for skewness. Values of 

Ft 0^ are obtained from Appendix F an^ are "then multiplied by as. 

As an illustration of this method of fitting, we use the data of Table 
23 .JI 5 which are shown graphically in Chart 23.16. The fitting pro- 





Chart 23.16. Second Approximation Curve Fitted to Depth of Sap- 
wood. Based on data of Table 23.11. 


cedure^^ for a second approximation curve is shown in Table 23.12. The 
values of V, X, s, and 0:3 having been obtained (Table 23.11), the steps 
are as follows: 


L Make entries in Columns (1) to (6) inclusive, as was done in fitting 
a normal curve. 


2 . 


Refer to Appendix F and enter in Column (7) the Ft 



values 


“Sheppard’s correction has not been applied in the computation of the second 
moment, partly because high contact is not present at the left in Chart 23.16. Fur- 
thermore, Shewhart points out (op. dtj p, 78) that the corrected standard deviation 
(0.798211) differs more from the standard deviation of the ungrouped data (0.802555) 
than does the uncorrected standard deviation (0.802895) When high contact is not 
present at both ends of a distribution, overcorrection of a moment is not unusual. It 
arises because the corrections allow for non-existent classes at the extremes. 




TABLE 23.12 

i^etermination of Expected Frequencies for Data of Depth of Sapwood by Means of a Second Approximation Curve 

{X = 2.9141 inches; 5 =- 0.8029 inches; Qa +0.2407) 
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proportionate 

frequencies 

(10) 
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0 0067 
0.0185 
0.0403 
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1 0.1493 

0.1403 
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0.0094 
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O.OOOl 

[Col. 6 - Col. 8] 
(9) 
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0.5160 
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X 

associated with each - value of Column (5). Negative signs are entered 
s 

ill this column for the percentages associated with class limits of ColuniE 

( 2 ). 

3. In Column (8), multiply each value of Column (7) by (Xz, Signs 
are shown. 

4. To produce Column (9), the values in Column (8) are subtracted 
algebraically from the values in Column (6). 

5. The cumulative proportionate frequencies of Column (9) are 
deciimulated in Column (10), as was done for the normal curve. The 
result is a series of figures showing expected frequencies on the basis of the 
second approximation for AT = 1.0000. One of the shortcomings of this 
curve is that it may occasionally produce negative frequencies at one end, 
or, if we do not extend the fit far enough to produce these negative fre- 
quencies, the total may slightly exceed 1.0000. In this instance Column 
(10) totals 1.0002. 

6. In Column (11) the expected frequencies are prorated among the 
classes so that the total equals N for the sample. 



Symbols Used in Chapter 24 


lower-case Greek beta; skewness in a population, 
skewness of the distribution of sample X values. 
/? 2 ^: kurtosis in a population. 

^ 2 ^'- kurtosis of the distribution of sample X values. 
jD: a difference between paired values, 
d': deviation, in terms of class intervals, of X from Xd. 
0.2 

F: see Chapter 26. 

u 2 


/: frequency. 

k: number of samples, k will ordinarily be much smaller than K. 

K: the number of possible samples of a given size from a population. 
ni degrees of freedom in a sample. When two samples are under con- 
. sideration, n = ni + ?i 2 . 

N : the number of items in a sample. 

P: probability; varies from 0 to 1. 

(P: the number oHtems in a population. As a subscript, (P means ‘^popu- 
lation,’^ thus X(p is the arithmetic mean of a population, 
r: the correlation coefficient. 
s: the standard deviation of a sample. 

cr: lower-case Greek sigma; the standard deviation of a population. 

&: the estimated standard deviation of a population, computed from a 
single sample. Referred to as “sigma caret” or “sigma hat,” 

$*1 is an estimate based on sample 1. 
d '2 is an estimate based on sample 2. 

3*1+2 is an estimate computed by pooling values and degrees of 
freedom from two samples. 

&£,: the estimated population standard error for a series of D values, 
the standard error of X. When two samples are under consideration, 
we use and <Ttr 

8'x: the estimated standard error of X. 

the estimated standard error of the difference between two sample 
arithmetic means, 

the estimated standard error of X|>. 

S; upper-case Greek sigma, meaning “take the sum of.” 

624 
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I -X<f Zi - Zj Xo 

t: — r ’ —g > or r — 

02 os^-Si oxb 

x: X — X; also, X — X(P in the expression which see. 

cr 

Xii a deviation of a value in series 1 from Xi^ = S(Xi — Xi)^. 
X 2 : a deviation of a value in series 2 from X 2 ; ^xl = — X 2 )^. 

X : an observed value in a sample. 

Xi: an observed value in sample 1. 

X 2 : an observed value in sample 2. 

X : the arithmetic mean of a sample. 

Xii the arithmetic mean of sample 1. 

X 2 : the arithmetic mean of sample 2. 

Xi>: the arithmetic mean of a series of D values. 

X(s>: the arithmetic mean of a population. 

X(p^: the lower confidence limit of X(p. 

X( 5 > 2 : the upper confidence limit of X(?. ^ _ 

^ 

: a deviation divided by its standard error, for example, 

lower-case Greek chi. See Chapter 25. 



CHAPTER 24 


Statistical Significance I: Arithmetic 

Means 


In this and the two following chapters, we shall be interested in the 
behavior of statistical measures computed from samples. This is an 
important topic, since the statistical worker will nearly always be dealing 
with data which constitute a sample rather than a population. Usually, 
it is not possible to consider all of the items in a population. For exam- 
ple, it would be utterly impracticable to attempt to obtain data of the 
heights of all the adult males in the United States. If data of this sort 
were needed, a much smaller expenditure of time and money would be 
involved if a suitable sample were to be studied. Furthermore, the study 
of a properly representative sample ^an be expected to give satisfactory 
results, the reliability of which may be stated exactly. 

In this book we shall consider only random samples.^ Arithmetic 
means will be discussed in the present chapter. Chapter 25 will deal with 
proportions and with certain aspects of the (chi-square) test. Chapter 
26 will discuss variances, the analysis of variance, correlation coefficients, 
and measures of skewness and kurtosis. 

HOW SAMPLE ARITHMETIC MEANS ARE DISTRIBUTED 

Data of the mileage run by each of many thousands of automobile tires 
of the same siz% quality, and make, used on similar vehicles under com- 
parable road conditions, show an arithmetic mean (^(p) of 16,200 miles 
and a standard deviation (cr) of 1,248 miles. If we select a random sample 
of 25 tires, we would expect the arithmetic mean of the random sample 
to be in the general neighborhood of 15,200 miles. A second random 


^ A ran^m sample was defined on page 26. The procedures for certain types of 
non-random samples are given in H. M. Walker and J. Lev, Biatistical Inference, 
Henry Holt and Co., New York, 1953, pp» 171-17S; additional references are given 
on pages 177 and 178* 
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sample of 25 items would not yield exactly the same arithmetic mean as 
the first, but it, too, should be in the general neighborhood of 15,200. 
Our first concern is with the behavior of arithmetic means of random 
samples. Since we shall be dealing with only random samples, and since 
we shall not be considering geometric, harmomc, of other means, we shall 
simply say sample mean to refer to the arithmetic mean of a random 
sample. 

The arithmetic mean of sample means. If a number of random 
samples, each of 25 tires, were to be taken from the tire population just 
mentioned, some of the sample means would exceed 15,200 miles and some 
would fall below 15,200 miles. One, or a very few, might happen to be 
exactly 15,200 miles. The arithmetic mean of sample means would tend 
to equal Z(p. 

Consider a more specific illustration: Walter A. Shewhart^ constructed 
a population of 998 items, having positive and negative values ranging 
from —3.0 to 3.0, and with X{ 5 » = 0. It is not important at this point 
that the population was as nearly normal as it was possible to make it. 
From this population Shewhart drew 1,000 samples (k = 1,000) of 4 items 
(N = 4) each. The arithmetic mean of the 1,000 sample means was 
0.014. If a larger number of sample means had been taken, it is reason- 
able to believe that the arithmetic mean of the sample means would have 
been more nearly zero, since it may be shown that, if all possible samples 
(K) of size N are drawn from a population, the arithmetic mean of the 
sample means will equal the population mean.® That is, 

+ ^2 + Xg + • • • + Xk ^ 

Skewness of sample means. If sample means are from a population 
which has no skewness, the distribution of sample means will not be 
skewed. If the population is skewed, the distribution of sample means 
will show less skewness, the skewness being inversely related to the size 
of the sample, according to the relationship 

N’ 

Shewhart’s population of 998 items had = 0. The distribution 
of the 1,000 sample means, together with the population, is shown in 

* Walter A. Shewhart, Economic Control of Quality of Manufactured Product, U. Vaa 
Nostrand Co., Inc., New York, 1931, pp. 167, 442-445, and 454-463, 

> See Appendix S, section 24.1, 
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Chart 24.1. It ma 3 ' be seen that the distribution of the sample means is 
nearly symmetrical. Shewhart does not compute the value of for 
the 1,000 sample means, but for the frequency distribution in class 
intervals of 0.25, shown in Chart 24.1, has been found to be 0.0027. 


fRCQUCNCjCS 
PCft 0.25 CUA»a 

wtervai. 



Chart 24,1 . Distribution of Sbcwhart’s Normal Population of 998 Items 
and of 1,000 Sample Means for Samples Having iV « 4. The class intervals 
were 0.50 for the population and 0.25 for the sample means. Based on data from 
W. A, Shewhart, Economic Control of Quality of Manufactured Product, D. Van Nostrand 
Co., Inc., New York, 1931, pp. 167, 442-445, and 454-463. 


Chart 24.2 shows the distribution of the arithmetic means of 100 
samples of 10 items each and the distribution of the skewed population 
from which the samples were drawn. For the population, j(?i^ — 0.096. 
If all possible samples of AT ~ 10 had been drawn, iihe skewness of the 
sample means would have been 


N 


0.096 

10 


« 0.0096. 


For the' 100 samples, « 0.003L It is clear that the skewness of the 
sample means is much less than the skewness in the population. 
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Shewhart^ has drawn samples from a population which is much more 

skewed than that shown in Chart 24.2. His right-triangular universe and 
the distribution of 1,000 sample means (N ~ 4) are shown in Chart 24,3. 
The skewness of the right-triangular universe is indicated by = 0.320. 


raCQUEMCtES 



VALUES OF X OR S 


Cliart 24.2, Distribuxion of Skewed Population of 972 Items and of 100 
Sample Means for Samples Having N « 10. The population consisted of the 
weekly earnings of 972 wage earners. Class intervals were $2.50 for both, series. 

For samples of 4, we would expect the skewness to be about 


^ 4 


0.080.. 


For the distribution of the 1,000 sample means, the skewness has been 
computed to be 0.062. While this value of jdijf is larger than those just 
obtained for the other two sets of samples, it must be remembered, first, 
that the skewness is much less than that of the population and, second, 
that populations as skewed as this are not often encountered. 

Kurtosis of sample means. The kurtosis of a distribution of sample 
means may be expected to be closer to 3.0 (the value for a normal dis- 


* The population data are from page 183 of the reference given in footnote 2, The 
data of sample means were obtained by correspondence from Dr. Walter A. Shewhart, 
All skewness and kurtosis values (except those for the normal population) were com- 
puted by the writers* 
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tribution) than the kurtosis of the population from which the samples 
were taken. The relationship is 


^2/p — 3 

,/52,-3 = -^.or 




+ 3. 


For Sliewhart^s normal population, the value of ^ 2 ^, was 3.0, and the 
distribution of sample means (Chart 24.1) would be expected to have 
~ 3.0. For Shewhart’s 1,000 sample means, ^ 2 ^ was 2.98. 


FREQUENCIES PER 
0.J CLASS INTERVAL 



Chart 24.3. Distribution of Shewhart’s Right-Triangular Population of 
820 Items and of 1,000 Sample Means for Samples Having iV = 4. The class 
intervals were 0.1 for the population and 0.2 for the sample means. For source of 
data, see footnote 4, 


Shewhart also constructed a rectangular population,® shown ir£ Chart 
24.4A, which is extremely platykurtic, having ^ 2 ^ = 1.80. From this 
population he obtained 1,000 sample means {N = 4), the distribution of 
which is also given in Chart 24.4A. This curve looks as if it might be 
nearly mesoktirtic. The kurtosis of these sample means would be 


' See footnote 4. 
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expected to be 


B2;f - 3 1.80 - 3 

“ + 3 = 

4 4 

2.70. 


+ 3, 


For the ijOOO sample means, = 2 . 99 . 

Shewhart did not consider a leptokurtic population, but Alfred J. Kana 
designed such a population of 1,000 items, which is shown in Chart 

FREQUENCIES PER 
Oi CLASS SNTERVAL 
SO 


40 


30 


20 


10 


0 

-30 “20 -10 0 1.0 20 30 

VALUES OF X OR X 

Chart 24.4 A. Bistributioii of Shewhart’s Bectangolar (Plalykurtic) Popu- 
lation of 122 Items and of 1,000 Sample Means for Samples Having iV =« 4* 
The class intervals were 0.1 for the population and 0.3 for the sample means. For 
source of data, see footnote 4. 



24.4B. From this population, Kana obtained 400 sample means (N ~ 
5), the distribution of which also appears in Chart 24. 4B. The kurtosis 
of the population was = 7.927. Selecting samples of five items each 
could be expected to yield 




jSsrt. - 3 7.927 - 3 

+ 3 = — £ + 3 = 3 . 935 ^ 


N 


Only 400 samples were drawn, but for this group of samples it was found 
that ^ 2 ^ — 4.190, a value much nearer to 3.0 than the value of 
Sample means and the Bormal curve. From what has been said, 
it is clear that the distribution of sample means is normal when those 
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meafiS have been computed from random samples from a normal popula- 
tion, If a population is skewed, the skewness present in sample means 
drawn from that population will be much less, the skewness being 
inversely related to the size of the sample as indicated by 


If a population is leptokurtic or platykurtic, the distribution of sample 
means drawn from that population will be more nearly mesokurtic, as 

PREQUENCIES 



Chart 24.4B. Distribution of Kana’s Leptokurtic 
Population of 1,000 Items and of 400 Sample Means 
for Samples Having iV = 5. The class intervals were 
1.0 for both series. The kurtosis values, given in the text, 
were computed from ungrouped data for both series. 
Data from Alfred J. Kana. 


shown by 


= 


JV 


4 - 3 . 


As a consequence of these two relationships, statisticians consider 
sample means to be distributed normally unless there is reason to believe 
that the population from which they were taken departs markedly from 
normal 

Dispemon of sample means. A glance at any of the four preceding 
charts will reveal that the dispersion of sample means is much less than 
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the dispersion of the population from which those sample means came. 
The relationship is® 


For the population data of Chart 24.1, we havecr 
Consequently, 




1.0070 

Vi 


0.5035. 


1.0070 and A = 4. 


For the 1,000 sample means, the standard deviation may be computed 
using the expression 

f(Z~- + ■ ■ ■ + (ii,ooo - 

"V 1,000 

« See Appendix S, eection 24.2. Note that, as shown In the proof, the expression 
used above is riot valid unless the population is large in relation to N. 
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The value of the standard deviation for the frequency distribution of 
sample means, shown in Chart 24.1, is 0.503, which agrees very closely 
with the value .of 0.5035 that would have been obtained if we could have 
considered all possible samples of iV = 4. 



44 46 48 50 52 54 66 


MltUMETEHS 

Chart 24.6* Distribution of Sample Ariilinielic Means for « 50 
ram and cr = 8 mm, 'When iV «= 16 (Solid Curve) and 'W’ben N « 64 
(Broken Curve). 

From the expression 

it is obvious that (1) the greater the dispersion of the population, the 
greater the dispersion of sample means taken from that population; and 
(2) the larger the she of the samples, the smaller the dispersion of sample 
means. These points are illustrated in Chart 24.5, which shows the dis- 
tributions of sample means for two different values of <r when N is 
unchanged, and in Chart 24.6, which shows the distributions of sample 
means for two sample sizes from the same population. 
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SIGNIFICANCE OF THE DIFFERENCE BETWEEN X XND 
WHEN Xcp AND (X ARE KNOWN 

A difference between H and X(s> that is not significant. Consider 
the tire-mileage data referred to previously for .which X(p = 15,200 miles 
and cr = 1,248 miles. If random samples of 100 tires are to be drawn, we 



- 2.5 8 Oje ^ + 2.5 Scr^ 

Chart 24.7. Expected Distribution of Sample Arithmetic Means, from a 
Normal Population, Showing the 0.05 and 0.01 Levels. 


would expect the sample means to have 


(T 1,248 

~ vs " Vioo ■ 


124.8 miles. 


Consequently, the sample means would be distributed as shown in Chart 
24.7. In this chart, particular attention has been called to the deviations 
of ± 1.96 <Tx nnd ±2.58o>. As may be seen from the chart, ± 1.96a”j^ cuts 
off 6 per cent of the area of the curve in the two tails, while ± 2.68<rt cuts 
off 1 per cent of the area of the curve in the two tails. These percentages 
may be obtained from the table of areas of the norma! curve (Appendix 
E) which we used in the preceding chapter or, more readily, from Appen- 
dix H, which shows areas in two tails of the normal curve. The two 
deviations shown in Chart 24,7 are those which denote, for the normal 
curve, the 0.05 level and the 0.01 level. Significance tests make frequent 
use of the 0.05 and 0.01 levels, although other levels — ^for example, 0.001, 
0.005, 0.02, and 0.025— are also employed. 
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One sample of 100 items, allegedly a random sample and supposedly 
drawn from the population mentioned in the preceding paragraph, was 
found to have X ^ 15,269 miles. We are interested in knowing whether 
it is reasonable to believe that this sample mean is the arithmetic mean 
of a random sample from the population having 15,200 miles and 

(T = 1,248 miles. The difference between X and X(p is 69 miles. In 
order to be able to refer to the normal curve, we express this difference in 
terms of cr^, which has already been ascertained to be 124.8 miles. 



Chart 24.8. Expected Distribution of Sample Means and Chances of Obtain- 
ing Sample Means Dijffering from X(p by ±: O.SSo^ or More* 

Therefore, 

X ^ 1 - ^ 15,269 - 15,200 ^ ^ 

(Tx 124.8 124.8 " 

Referring to Chart 24.8, we may see the area under the normal curve (the 
cross-hatched portion) which is cut off by a deviation of +0.55cr^. From 
Appendix G, which shows areas in one tail of the normal curve, this cross- 
hatched tail is found to include 29 per cent of the area under the curve. 
Since we know that sample means both exceed and fall below X(p, we con- 
sider also the tail of the normal curve cut off by “-0.55<ri, which is the 
stippled portion in Chart 24,8. This tail, too, includes 29 per cent of the 
area under the curve, and the two tails combined contain 58 per cent 
{P 5= 0.58) of the area under the curve. From this we conclude that, 
since a difference of ±0.55<Ty may occur so frequently through the opera- 
'tions-of random sampling, there is' no adequate basis for thinking that 
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the sample mean was not the mean of a random sample from the popula- 
tion under consideration. 

The foregoing involved setting up the hypothesis that the sample 
mean was the mean of a random sample from the population having 
= 15j200 miles and cr = 1,248 miles. Thi§ hypothesis is referred to 
as a ^^nuU hypothesis,” since it is a hypothesis of no difference between 
X and The next step consisted of testing the hypothesis by com- 

puting a significance ratio - and determining the probability of obtaining 

O’ 

a deviation equal to or greater than that observed, as a result of random 
sampling. Our test casts much doubt (if P is small) or little doubt (if 
P is large) on the hypothesis. Since P was found to be 0.58, our hypothe- 
sis was not impugned. 

Note that we did not prove” the hypothesis. Statistically, a 
h 3 rpothesis can never be proven” or ^Misproven.” By means of 
repeated experiments which always yield consistent differences, or lack 
of them, an investigator might eventually consider a hypothesis false or 
valid. Statistical tests, however, can merely cast much or little doubt 
upon a h 3 rpothesis, thus discrediting or failing to discredit the hypothesis. 

A difference between X and X(p that is significant, Considrer 
another sample of 100 tires having X — 14,738 miles. To test the 
hypothesis that this mean is the mean of a random sample from the popu- 
lation having X(j> = 15,200 miles and <r = 1,248 miles, we compute 

X ^ I -Is> ^ 14,738 - 15,200 ^ 462 ^ 

<r~ era: ~ 124.8 ~ 124.8 ~ ‘ ' 

Referring to Appendix H, which shows areas in two tails of the nornaal 
curve, we find that P = 0.000216. This is pictured in Chart 24.9. 
Since a difference such as that observed could be expected to occur so 
infrequently as a result of random sampling, the null hjqjothesis is not 
tenable. The sample mean may have been the mean of a non-random 
sample from the population under consideration, it may have been the 
mean of a random sample from a different population, or it may have been 
the mean of a non-random sample from a different population. In any 
event, we feel justified in declaring that it is not (that is, it is extremely 
imlikely to be) the mean of a random sample from the population having 
= 15,200 miles and cr == 1,248 miles. 

The two tests which we have made were both two-tail (or two-sided) 
tests, since we considered either plus or minus differences as tending to 
discredit the null hypothesis. Sometimes, as we shall see in later portions 
of this text, a positive divergence will tend to discredit a hypothesis, while 
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a negative difference will not; in such a case, we should consider only the 
area in the right tail of the appropriate curve. When a negative differ- 
ence tends to discredit a hypothesis, but a positive difference does not, we 
take cognizance of the area in the left tail of the curve. ^ 

The value of P and .significance. We have just considered two 
differences, one of which wa^ declared significant’^ and one ^^not- 



Chart 24.9. Expected Distribution of Sample Means and Chances of Obtain- 
ing Sample Means Differing from X<p by ±3.T0crj or More* 


significant.” These examples were purposely selected to illustrate con- 
clusions that would be obvious once P had been determined. How small 
should be the value of P in order for a difference to be declared significant? 
This is not an easy question to answer,^ since the answer depends largely 
upon the nature of the phenomenon being considered and the conse- 
quences of being wrong. 

For the sample having X = 14,738 miles, we found P to be 0.000216 
and considered the null hypothesis to be discredited. Actually, it is 
possible that the hypothesis was true and our conclusion wrong, since 
random samples would show a deviation equal to or greater than 3.70cri 
exactly 216 times in a million. 

^ There are also situations in which we may wish to make a two-tail test with 
unequal areas in the two tails. See, for example, the illustration given in M. G. 
Kendall, Ths Advanced Theory of Statistics^ Vol. II, Charles Griffin and Company 
Limited, London, 1948 (Second Edition), p. 99. 

®This relatively innocent-appearing problem involves very complicated aspects. 
For another non-technical discussion, see L. H. C. Tippett, Technological Aspects a/ 
Btatistics, John Wiley and Sons, New York, m50, pp. 93-95. A more detailed pre- 
sentation will foe found in H. M. Walker and Joseph Lev, Btaiisticd Inference, Henry 
Holt and Co., New York, 1953, pp. 162-167 (concerning means) and 44-79 (dealing 
Mth proportions)* 
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Type I errors. When a null hypothesis is actually tme, and whpn the 
difference under consideration is declared not significant (that is, the 
hypothesis is not impugned), the conclusion is correct. When a mill 
hypothesis is actually true, but when the difference involVed is declared 
significant (that is, the hypothesis is discredited), we say that a ^'Type I 
error has been made. If we use P == 0.05 as our criterion of significance, 
declaring significant all differences having P g 0.05, -we shall make 
exactly 1 out of 20 Type I errors in the, long run; if we use F = 0.01 as our 
criterion of significance, declaring significant all differences having P S 
0.01, we will make 1 out of 100 Type I errors in the long run. It must 
be clear that, the lower the value of F which is used as a criterion, the 
fewer Type I errors that will be made. Unfortunately, decreasing the 
proportion of Type I errors serves to increase the sort of error described 
in the next paragraph. 

Type II errors. When a null hypothesis is actually false and when the 
difference under consideration is declared significant, the conclusion is 
correct. When a null hypothesis is actually false, but when the difference 
being examined is declared not significant, we say that a *^Type II error 
has been made. If we use F = 0.05 as the criterion, we cannot say how 
frequently Type II errors will occur, since we cannot know how false the* 
h3q)othesis may be. The sample (or samples) may be a non-random one 
from the population involved, or the sample may be a random or non- 
random one from a population other than the one involved. In this 
situation, we can merely say that,rif we use P = 0.05 as a criterion, we 
should expect to make fewer Type II errors than if F « 0.01 is employed.^ 

9 We may, however, state the probability of Type II errors if we set up an alterna- 
tive hypothesis. The left curve in the accompanying diagram represents a test 
(using 0.05 in the right tail as the criterion) of the hypothesis that is the mean of a 
random sample from a population having ^(p as its mean, only positive values of 
H serving to discredit the hypothesis. 



Any value of r falling between - » and +* would cause us to accept the hypothesis. 
If the tnte value of Z(p is that shown at the center of the right curve, then the proba- 
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Choice of criterion. For practical purposes, the probability which is to 
serve as the criterion of significance should be chosen in the light of the 
type of error which should be avoided. If Type I errors should be as few 
as possible, P should be very small. If Type II errors should be few, P 
should be larger. Consider the following examples: 

An agricultural experiment station has developed a new hay crop 
which is believed to be superior to existing crops, such as alfalfa, lespede 2 ia, 
clover, and the like. In order for a farmer to raise the new crop, he must 
invest heavily in special machinery for sowing the seed and for harvesting. 
If, in the comparison of the new crop with the present crops, a Type I 
error were made, farmers who planted the new crop would incur heavy 
expenses but would find the new hay to be no better than that formerly 
fed to their stock. As a result, the farmers would have experienced heavy 
losses. If a Type II error were made, the new crop, though better, would 
not be introduced and, while farmers would have failed to gain the advan- 
tages that would have resulted, they would have incurred no actual loss. 
In such a situation as this, P should be very small, say 0.01 or 0.001, to 
warrant one in declaring the observed difference to be significant. 

Not long ago the United States Food and Drug Administration acted 
•against a chemical manufacturing concern, alleging that digitalis spld by 
the firm was half-strength. The difficulty said to be involved was that 
persons using this digitalis and becoming accustomed to it might experi- 
ence serious consequences if they shifted to a fuli-strength digitalis. In 
the case of a drug such as this, it is important that the day-to-day pro- 
duction be kept in conformance with the standard (population). As 
tests are made of each batch, it is essential that no batch should be 
appreciably stronger or weaker than the population. If, in testing a 
batch, a Type I error were made (that is, if the batch is said to differ 
significantly from the population when it actually does not), the result 
would be that the batch would be discarded or reprocessed. On the 
other hand, if a Type II error were made, we would be stating that the 
batch did not differ significantly from the population when a real differ- 
ence was actually present, and serious harm, even death, might result to 


biiity of a Type II error is represented by the shaded area, which is about 0.20, Other 
alternative hypotheses may also be set up. Note that if the true Xe is farther to the 
right, the probability of Type II errors is decreased; if the ime is farther to the 
left, the probability of Type 11 errors is increased. Prom the chart it is also clear 
that, if the black area (representing the probability of Type I errors if X{p at the left 
is the trm mean) is decreased, the probability of Type II errors (if the true m &B 
noted on the chart) is increased; if the black area Is increased, the shaded area is 
decreased. For a further discussion see the second reference mentioned in note S 
and also A. M. Mood, Introduction to the Theory of StatuticSi McGraw-Hil Book 
Oompany, New York, 1950, pp, 245-267. 
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persons using the drag. In such a situation it is clearly more impdrtant 
to avoid Type II errors than Type I errors, and P should therefore be 
fairly large, say 0.10 or, preferably, larger. 

There will be frequent occasions when one cannot say whether Type I 
or Type II errors are more serious. If one is testing the difference of the 
mean IQ’s of male cooks and of male dishwashers,^® such a situation 
arises. Here the investigator might be satisfied to use P = 0.05 as a 
criterion. 

From the foregoing it should be clear that the same value of P should 
not be used as a criterion for all tests. The appropriate level will depend 
on the circumstances. One should never state that a result is significant 
or not significant without also giving the value of P, which may ordinarily 
be read with sufficient accuracy from existing tables, interpolation being 
rarely called for. Alternatively, one may say: ^^significant at the 0.01 
(or other) level.” Sometimes an investigator will say : Significant at the 
0.05 (or other) level but not significant at the 0.02 (or other) level.” 
Stating the value of P allows the reader to draw his own conclusion con- 
cerning significance. 

Another important consideration is the desirability of deciding, in 
advance of attacking a problem, the criterion of significance that will be 
used. This avoids the possibility that the P value which is obtained may 
influence one in setting his criterion. This is particularly likely to happen 
if one hopes” for a significant or non-significant difference. 

Probability and everyday occurrences. The reader may feel that 
the conclusions regarding significance and based upon probabilities 
involve a new basis of thinking which he has not encountered before. 
This may be true, in that we are using some of the most elementary ideas 
of mathematical probability.^^ However, basing decisions upon proba- 
bility of some sort has been an everyday occurrence throughout every- 
one’s life. The student studying for an examination considers the parts 
of the course about which the instructor is likely to ask questions and the 
portions not likely to be covered in the examination. This crude subjec- 
tive sort of probability serves as a guide to him as he reviews. The 
baseball coach must consider the chances (or ^*play the percentages,” as 
the radio commentators say) before he orders a squeeze play or before he 
puts in a right-handed batting pinch hitter, batting at 0.240, to replace a 
left-handed batting regular, batting at 0.290, to face a left-handed pitcher. 
Before one approaches his boss for a raise, he usually considers whether 
today, tomorrow, or some other day will likely be most propitious. On 

Differences between two sample means are discussed on pages 651-657. 

See, for example, James G. Smith and Aeheson J. Duncan, Bleimmtary Staiisticu 
and' Applimiwm^ McGraw-HI! Book Co., Inc., New York, 1944, Chapter 10. 
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a mu6h larger scale, unions are not likely to demand wage increases during 
the slackest months of the year or during a depression. Similarly, 
utilities are not apt to ask for rate increases when business is in the 
doldrums. 

Size of sample. Oecasionally one may wish to know the sample size 
which will give a specified de^ee of assurance that sample means will fall 
within designated limits. For the data of tire mileage, where = 
15,200 miles ando- = 1,248 miles, what sample size would result in sample 
means var3dng within ± 200 miles for 98 out of 100 samples? The answer 
is obtained by substituting in the expression 

X ^ I -Xg. 
ff CFS 


the known and designated values and the value of - (from Appendix H 

CT 

or the last row of Appendix I) which cuts off two tails which include two 

X 

per cent of the area of the normal curve. Since the - value is 2.326, we 


have 


2.326 = 


200 

1,248 * 

Vn 


200 Vn = (2.326K1,248) = 2,902.8. 
Vn «= 14.5. 


N = 210. 


SIGNIFICANCE OF THE DIFFERENCE BETWEEN X 
AND X(5> WHEN <r IS NOT KNOWN 

The preceding discussion has dealt only with the procedure which is 
applicable when Xg and <r are known. It is very unusual for population 
values to be available. This will be obvious if we enumerate the most 
important conditions under which population values may be knowm. 
They are: 

(1) A complete census may have feeen taken. Thus, from the most 
recent United States census X and <r could be computed for ages of all 
persons enumerated. (Note that the roimding tendency, mentioned on 
pages 22-23, would affect the accuracy of these, or any other, age figures 
not based on correctly reported dates of birth.) 

(2) Population values may be known as the result of extensive experi- 
ence. This is the type of situation illustrated by the tire-mileage data. 
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(3) Much like the preceding is the setting up of a control population^’ 
to serve as a standard in quality control. Here, many units are manu- 
factured under carefully controlled conditions, and the statistical values 
computed from these units are treated as population data. Day-to-day 
production figures are then compared with the population data. 

(4) Population values may be known or assumed upon the basis of 
hypothesis or theory. Cases are encountered most frequently when 
dealing with proportions rather than means. In a test to ascertain 

TABLE 24.1 

Breaking Strength of 10 Specimens of 
0M4~Inch EHameter Mard^drawn 
Copper Wire 


Specimen 

Breaking strength i 
in pounds 

X 

X* 

1 

578 

334,084 

2 

572 

327,184 

3 

570 

324,900 

4 

568 

322,624 

5 

572 

327,184 

6 

570 

324,900 

7 

570 

324,900 

8 

672 

327,184 

9 

596 

366,216 

10 

584 

341,056 

Total 

5,752 

3,309,232 


Data from American Society for Testing Materiala* 
Supjilemenis to t9SS A.S.T.Jtf. Manual on Preaentation 
of Data’, “Supplement A — Presenting Plus and Minus 
Limits of Uncertainty of an Observed Average/’ p» 1, 
reprinted from Proceedings of the American Society 
for Testing MaterialSt Vol. Part 1, PbHaddpbia, 
1936. 




5752 „ 

=* 676.2 pounds. 


/a, 309, 232 

(6762)» 

^ 9 

10-9’ 


V75.73 - 8.70 pounds. 


whether tea drinkeiB could differentiate between tea sweetened with 
sugar and with saccharine, the population proportions might be assumed 
to be 0.60 for each sweetening agent. In a preference test for fonr 
brands of coffee, the population proportions would he taken as 0.25 for 
each brand. 

A difference between X and Xa> that is not significant. Tests 
have been made of the breaking strength of ten pieces of hard-drawn 
copper wire, as shown in Table 24.1. The arithmetio mean of the ten 
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values is 575.2 pounds. With 0.01 as our criterion, let us test the 
hypothesis that X = 575.2 pounds is the mean of a random sample from 

a population having J 5 . = 577.0 pounds. Now w.e do not know <t, and, 
since we lack <r, we must make an estimate of cr from the data of the 
sample. This estimate is obtained from the expression*® 


1 , 


Jn - 1 


1 sx® 

(sxy 

J]f _ 1 

N(N - 1) 

Imdr 

{^fd!Y 


N{N-l) 

is called an “unbiased” estimate of <r®, since*® 

+ ^2 + ■ ■ " 4 - 
K 


for ungrouped data, 
for grouped data. 


5 ® is not an unbiased estimate of <r®, since 

K 

Now that we have we are in a position to make an estimate of 
This 



For the data of breaking strength of copper wire, the computation of d* 


The basic expression for & is developed in Appendix S, section 24.3. The forms 
for ungronped and for grouped data are obtained from this basic expression by the 
same procedure as that given in Appendix S, section 10,2. 

See Appendix S, section 24.3. 

If « Is known for a sample, it may be converted into d by use of 





However, such a conversion is not necessary, since we can write 


Vn - 1 

It must be clear that, as N increases, the numerical difference between s and a becomes 
of negligible importance. Nevertheless, it is incorrect to use s m an estimate of <r. 
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is shown below Table 24.1, and 




_ 1 - 19 . _ 

" Vio" 


2.75 pounds. 


We may now compute the significance ratio 


This significance ratio differs from those preTiousIy used because the 
denominator is an estimate of crjf. Because of this substitution, we are 
no longer in a position to refer to the normal curve, but must make use 
of the t distribution, which, though symmetrical, is more widely dispersed 
than is the normal curve. This may be seen in Chart 24. 10. The spread 
of the t distribution depends upon the number of ‘‘degrees of freedom” 
(n) present, the dispersion being greatest forn = 1 and decreasing as n 
increases. As n approaches infinity, the t distribution approaches the 
normal distribution as a limit. This tendency is apparent from a look 
at Chart 24.10. For significance tests involving a single sample mean, 
such as the one under consideration, n = W — 1 because we used the 
deviations of N values about their own mean in order to compute d*. In 
other words, we employed, not W, but W — 1 independent deviations. 

For the data of breaking strength of copper wire, 


X- ^ 575.2 577.0 ^ 

2.75 “ 2.75 


0.65. 


The value of P is ascertained by referring to Appendix I forn = W—1“ 
10 — 1 = 9 and t == 0.65. This appendix table is somewhat different 
from the preceding table of the normal curve. Both tables show areas 
in two tails of the respective distributions, but Appendix H shows values 

X 

of P for selected values of while Appendix I shows values of t for speci- 

<r 

fied values of n and P. From Appendix I it is seen that 0.50 < P < 0.60, 
and we conclude that there is no significant difference between 1 and Za., 
Chart 24.11, which shows a t distribution for 9 degrees of freedom, 
illustrates what has been done. _ 

A difference between X and X^i that is significant. Norman C. 
Wiley^' gives data of tests of strength of three-inch manila rope, showing, 
for one sample, iV == 16, X = 9,959 pounds, and s = 248 pounds. Using 


“ The sample data are from StalisHcal Methods as an Aid in Revising Specificaiions, 
by N. C. Wiley, a preprint of a paper delivered at the forty-first annual meeting of 
the American Society for Testing Materials. 



ikiiOfNSH'niim. McutHt 



Chart 24.10. Comparison of the t Distribution for n 2, n = and 
n as 20 with the Normal Distribution. The values of t, shown above, are | Values 
for the normal curve. The ordinates of the t distribution are obtained from the 
expression 




This gives a maximum ordinate which approaches LO as n approaches infinity, and 
thus is comparable to the expression 


2<r* 


for the normal curve. The computation of 


(r¥) 

C-^) 


may be clarified by an illustra- 


tion, If n *= 11, the numerator is 5!, while the denominator is 4.5 L The value of 

4.51 is given by 4.6 X 3.6 X 2.5 X 1,5 X 0.5 X vV. 
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the 0.01 level as a criterion, we shall test the hypothesis that ^ =, 9,959 
pounds is the mean of a random sample from a population having 
= 10,148 pounds. In order to obtain we make use of the expres- 
sion given in footnote 13, 

&£ 

Then we compute 


From the t table of Appendix I, it appears that P is almost exactly 0.01, 
and we reject the hypothesis. The foregoing is shown graphically in 


248 


248 


VN - 1 vis 3.873 


= 64.03. 


1 - Zg. 9,959 - 10,148 


189 

64.03 ’ 


64.03 


2.95. 



Chart 24.11. The t Distribution for n « 9, Showing Probability of Obtain^ 
ing £ » ±0.65 or more. Between 0.50 and 0.60 of the area under the curve is in the 
two tails. 


Chart 24.12. Note that, if we had used the normal table of Appendix H, 
the probability would have been misleadingly small, about 0.003! The 
difference in the two probabilities would have been much less if the sample 
had been larger. As may be seen in Chart 24.10 and in Appendix I, the 
i distribution seems to begin to approximate the normal distribution at 
about n = 20. Some statisticians customarily refer to the normal table 
when n ^ 30, but this seems to have been due to the fact that, for some 
time, the available i tables gave no values of t between w = 30 and 
n = 00 . Appendix I lists i values for n — 30, 40, 60, 120, and « . It is 
best to use the t table in all cases where S’ has been used as an estimate 
of <r. 
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Confidence limits of X(p. In the illustration just given, it was con- 
cluded that the sample mean was not the mean of a random sample from 
a population having Z(p = 10,148 pounds. From a knowledge of the 
sample alone, what can be said about the limits within which X(p may be 
expected to occur? We want- two values for X(p, which we shall call 
and and which will be, respectively^smaller than and larger than 
Z. These are the confidence limits'' of Z(p. The first step consists of 
deciding how often we are willing to be wrong in our statement of con- 
fidence limits. Suppose that we can allow ourselves to be wrong not 



Chart 24.12. The t Distribution for n ~ 15, Showing Probability of Obtain- 
ing t ** ±2.95 or More. Almost exactly 0.01 of the area under the curve is in the 
two tails. 


more than 5 times in 100. In that case, we want the 95 per cent con- 
fidence limits. These limits are obtained by determining: 

(1) the value of X{p„ so located that cuts off the upper per cent 
tail of the distribution of sample means around Zo>i, and 

(2) the value of so located that X cuts off the lower 2^ per cent 
tail of the distribution of sample means around 

Both of these values may be had from the following expression, in 
which we substitute the already computed values of ]l and ^^^d the 
t value for the appropriate confidence limits: 

1 ± 

Since we want the 95 per cent confidence limits, and since n =» 15, the 
value of I (from Appendix I) is 2.13L We have, then 

9,959 « ± (2,131) (64.03). 

= 9,959 ± 136.4, 

= 9,822.6 and 10,095.4 pounds. 

The foregoing procedure is illustrated in Chart 24.13. 
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We are not sure that the population mean falls mthin the limits Just 
given, but we are 95 per cent confident that it does so. In other words, 
if many determinations of 95 per cent confidence limits are made, we can 
expect those limits to include the population value 95 times out of 100 
and to exclude the population value 5 times m 100. Roger P. Doyle 
computed the 95 per cent confidence limits of -X'^for each of Shewhart^s 
1,000 samples from a normal population. Using X, and n = 3 for each 
sample, he ascertained 1,000 pairs of confidence limits and noted, for each 
pair, whether they did or did not include = 0. His confidence limits 
were right in 951 instances, wrong in 49. 

While the preceding illustration obtained 95 per cent confidence limits, 
any desired limits may be computed, by merely substituting the appro- 
priate t value, together with the values of ^ and obtained from the 
sample. Limits such as 99.9, 99.8, 99, 98, 96, 95, and 90 are often used. 
Confidence limits representing less than 90 per cent confidence are not 
often wanted, since they do not express a very high degree of confidence. 

The determination of confidence limits for proportions, sample vari- 
ances (s^ or and correlation coefficients will be discussed in the two 
following chapters. For these measures, as well as for arithmetic means, 
the statistical worker should carefully consider the maximum and mini- 
mum possible values for the measure in question. Occasionally, the very 
nature of the variable sets limits, beyond which values cannot occur, and 
which should take precedence over computed confidence limits. 

The expression for determining the oonfidence limits of Z(p was written 

^ = X(p ± 

rather than 

^(P ^ ± t&Xf 

which would have given the same results. The purpose of doing this was 
to stress the fact that sample means are distributed around Tp, Chart 
24.13 also attempts to make this clear. There is no such thing as a dis- 
tribution of population means around Z. 

The illustrations given on the preceding 7 pages all involved and 
the t distribution. It may be well to stress the point that variations in 
the value of t occur because of sampling variations of # as well as because 
of sampling variations of X. A large value of t (and therefore a small 
F value) may result from the fact that X differs greatly from or 
because is smaller than <r, or both. A small value of t (and therefore a 
large F value) may ocdur because X closely approximates or because 
d* exceeds cr, or both. When <r is known, the only sampling variations 
present are those of X. 
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SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO 

SAMPLE MEANS 

Independent samples. From archaeological excavations con- 
ducted at a certain site, 16 lower first molars were recovered.*® We do 
not have the measurements of each of the 16 teeth, but we know that 
= 13.57 millimeters and si = 0.72 millimeters. From a nearby site, 
9 lower first molars were taken with ^2 = 13.06 and Sz — 0.62 milli- 
meters. Using P — 0.05 as a criterion, is there a significant difference 
in the mean length of these two groups of lower first molars? To make 
this test, we set up the null hypothesis that the two sample means are 
from the same population in regard to ^(p, and we test this hypothesis by 
determining the probability of t, where i is the ratio of .^1 — .^2 to an 
estimate of the standard error of the difference between the two sample 
means. 

As shown in Appendix S, section 24.4, the standard error of the differ- 
ence between two sample means is given by 

o'x.-ij = Voi, -f Oi,, 

provided that the two samples are independent. Non-independent sam- 
ples are considered later in this chapter. The expression just given may 

be written*’ 

/<r* cr* / 1 1 

* V i\ri N* m 


We cannot make use of this formula for our problem^ since we do not 
know the value of o*. (If we knew a, we would almost certainly know 
X(p as well, since is computed around X(p. If we knew it would be 
more meaningful to compare Xj and X2 with X{p than to compare the two 
sample means with each other.) Consequently, we make an estimate of 

Based upon illustrative figures used in a lecture by Professor Egon Pearsoxst at 
Columbia University. 

The assumption is made that the two samples are from the same population in 
regard to variance, TMs assumption is not unreasonable for our problem, since 
an F test, described in Chapter 26, reveals that there is not a significant difference 
between d'f and When two samples are believed to be from populations of unequal 

variance, and when iVi *» W 2 , or when Nt ^ N 2 and both are large, an approximate 
test may be made by using 




!£ 

‘SiNt 


+ 


£ 


For a discussion of procedures when the population variances are unequal, see Maurice 
G. Kendall, The Admneed Theory of Statistical Charles Griffin and Co.,, Ltd., liondon, 
1948, VoL II, pp. 111-11-1 
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the value of from the information given by the two samples. This 
estimate^® is 




1+2 


I 2x! 

^Ni-1 


+ 2x1 


+ Ns — 1 

When the individual observations are available for each sample, as is 
usually the case, we may compute 

(SZ)=* 


Sa:* = 2X‘ 


N 




S/(d')® 


(2fdT 
N . 


for ungrouped data, or 
for grouped data. 


For the problem at hand, we do not have the individual observations, but 
we do have si and ss. Since 


Si 


IM 

V Ni 

2x1 = 

We therefore compute 


and 

and 


12x1 

Sa;? = Nssl 


2x! = 16(0.72)* = 8.29; 
2x1 = 9(0.62)* = 3.46. 

The estimated value of <r is then obtained. 


. / 8.29 + 3.46 „ , 


+ 

The estimated standard error of the difference between the two means, 
may now be computed: 


'■S 


Ni Ns 


= ^1+2 ■ 

= 0.715 + i = 0.298. 


#? 4.2 is a weighted average of the two values for the separate samples. See 
Appendix S, section 24.5. Section 24.6 shows that when Ni « iVa, 


<^’14-2 




+ 


1 l&l , 

— "r 


When more than two samples are involved, the estimate of o*®' is given by 

Sr* 4- 2 jc| 4" 2®* 4* ♦ • • 

Ml 1 4 - Nt - 1 + iV'a ^14.../ 

We shall mahe use of this expression in connection with the discussion of analysis of 
variance in Chapter 26. 
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Finally we may obtain the desired significance ratio, 

^ ^ ^ 13,57 - 13.06 ^ 0.51 _ 

0.298 ” 0.298 ’ ‘ 

From the first set of data, we have ni -■= jV"! ---- 1 == 16 — 1 ~ 15 degrees 
of freedom; from the second set, = iV ^2 — 1 = 9 — 1 =8. Therefore, 
= m + ^2 = 23. Note that one degree of freedom was lost when 
lixl was computed about Xi and another degree was lost when hxl was 
computed about From the t table of Appendix I, we find P ^ 0.10, 
and we consider the difference between Xj and X 2 not significant. Chart 
24.14 illustrates the foregoing. 



Chart 24.14. The t Bistrihution for n =* 23, Showing Probability of 
Obtaining t ±1.71 or More. Approximately 0.10 of the area under the curve 
is in the IvTO tails. 


Confidence limits of X(Pj -X{P** Occasionally, when it has been 
concluded that a significant difference exists between and X 2 , it may 
be desirable to have a statement of the confidence limits of — .X(p*. 
This is obtained by solving the expression^® 

lx - 1* = - Is-,) ± 

for Xy, — X(p,. As in the determination of confidence limits for r®., 
the value of i is read from Appendix I and depends upon (1) the level 
of confidence to be used and (2) the degrees of freedom, which are 
n = iVi - 1 + Vs - 1. 

To illustrate the use of the expression given above, consider the yield 
point of structural steel (for ships) obtained from two sources. For 

As in testing the significance of the difference between and r 2, it is assumed 
that the two samples are from the same population in regard to (r*. 
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source l-.Ni = 10, h = 45,948 pounds per square inch, and Si = 2,910 
pounds per square inch. For source 2: 2\rs = 19,^2 = 39,820 pounds per 
square inch, and S 2 = 2,510 pounds per square inch.®® Employing the 
same expressions just used for the data of lower first molars, it is found 
that = 1,074.9 and -- 


t 


h - 45,948 - 39,820 

" 1,074.9 

6,128 ^ 

1,074.9 


This value of < forn ■= + n 2 = 9 + 18 = 27 is far beyond the 0.001 

level, so the difference between the means is significant. _ 

To obtain the 98 per cent confidence limits of .X®,, — Zy,, we use 
i = 2.473 and substitute the known values in 

.^1 — i 

This gives 

45,948 - 39,820 = (1^, - ± (2.473)(1,074.9). 

Zfl., - l(p, = 6,128 ± 2,658, 

= 3,470 and 8,786 pounds per square inch.. 

Non-independent samples. When inherent pairing exists between 
the pairs of items in two samples, it usually follows that the two samples 
are not independent. We are not concerned if the first, and succeeding, 
pairs of values in the two samples just happen to be paired because they 
were selected in the order listed; we are concerned if, for example, the 
paired readings are values of IQ’s of brothers and sisters or of twin?, or 
if the values are mileages of tires on original treads and after recapping. 
By far the greatest majority of problems which will be encountered will 
deal with independent samples. However, it is extremely important 
that non-independent samples be recognized as such; they must not be 
treated as independent samples. 

The data of Table 24.2 show the percentage of solids in the shaded and 
exp<^ed halves of 25 grapefruit. Here, it is obvious that the two sets of 
data are not independent; they are inherently paired. The shaded side 
of grapefruit Number 1 had 8.59 par cent solids while the exposed side 
of the same grapefruit had 8.49 per cent solids. These two figures are 
inherently paired with each other, because they refer to the same indi- 
vidual fruit. The same is true of the figures for the other 24 grapefruit. 


** The data are from the souree given in footnote 15. 
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TABLE 24.2 

Percentage of Solids in the Shaded and Exposed Mulvm 
of 25 Grapefruit 


Fruit 

Shaded 

X. 

Exposed 

D - Xi ^ Z 2 


i 

8.59 

8.49 

' 0.10 

0.0100 

2 

8.59 

8.59 



3 

8.09 

7.84 

0.25 

0.0625 

4 

8.54 

7.89 

0.65 

0.4225 

5 

8.09 

8.19 

-0.10 

0.0100 

6 

8.49 

7.84 

0.65 

0.4225 

7 

7.89 

7.89 



8 

8.59 

7.89 

0.70 

0.4900 

9 

8.54 

7.79 

0.75 1 

0.5625 

10 

7.99 

7.84 

0.15 

0.0225 

11 

7.89 

7.79 

0.10 

0.0100 

12 

8.09 

7.84 

0.25 

0.0625 

13 

7.89 

7.89 1 



14 

8,54 

8.07 

0.47 

0.2209 

15 

7.84 

7.97 

-0.13 

0.0169 

16 

7,49 

7.57 

-0.08 

0.0064 

17 

7.89 

7.92 

-0.03 

0.0009 

18 

7.79 

7.97 

-0.18 

0.0324 

19 

7.84 

8.17 

-0.33 1 

0.1089 

20 

8.89 

8.67 

0.22 

0.0484 

21 

8.54 

8.07 

0.47 

0.2209 

22 

8.04 

7.97 

0.07 

0.0049 

23 

8,59 

8.62 

-0.03 

0.0009 

24 

8.19 

7.92 

0.27 

0.0729 

25 

.8.59 

7.97 

0.62 

0.3844 

Total 

205.50 

200.66 

4.84 

3.1938 


Data from Paul L. Harding, Plant Physiologist, Division of Fruit and 
Vegetable Crops and Diseases, Bureau of Plant Industry, Soils and Agri- 
cultural Engineering, Agricultural Research Administration, United States 
Department of Agriculture. 

_ so 4.84 

X]} = «= = 0.194 per cent. 

J Sn* (SD)» ../3.1938 (4.84)> 

“■d “ _ 1 _ 1) “ \ 24 25(24)’ 

=. VO-ISSOTS - 0.039043 - \/0.094032, 

« 0.307 per cent. 

^ dj, 0.307 „ 

aa — px. » — = « 0.061 per cent. 

“ VN V25 

In order to test the significance of the difference between the means for 
shaded and exposed halves, we obtain the difference D between each pair 
of values, determine the value of X®, and ascertain whether To differs 
significantly from 0. The null hypothesis is that Xx> is the mean of a 
random sample from a population of differences having a mean of zero. 
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Below' Table 24.2 the computations are shown which give 

= 0.194 per cent, 

&D = 0.307 per cent, and 
= 0.061 per cent. 

We then determine the value of t, 



0.194 - 0 
0.061 


3.18. 


Since there are 24 independent D values, n = 24, and reference to Appen- 
dix I shows that P is between 0.01 and 0.001. 

It is very important that the lack of independence between the two 
samples be recognized in such a problem as this. Had we followed the 
usual procedure, which assumes the samples to be independent, com- 
puting Xi = 8.22 per cent, .^2 = 8.03 per cent, and = 0.092 per 
cent, we would have obtained 


8.22 - 8.03 _ 0.19 
0.092 ~ 0.092 


2.07, 


which, for n = 48, has 0.025 < P < 0.05. This probability differs 
greatly from that found first. In fact, if one were using the 0.02 or 
0.01 level as a criterion of significance, the method assuming inde- 
pendence of the two samples would have led him erroneously to conclude 
“not significant.” 

The possible consequences of employing the method which assumes 
independence of the two samples when they are not, in fact, independent 
may be clarified by writing djco in its alternative form,®* 

— 2rdjt,6jf„ 

when r is the correlation between the two samples. If the shorter form, 

^Si-S, = Vi5i, + 6i„ 

which assumes independence, is used, the value of ’'^11 be too large 
when there is pcraitive correlation between the two sets of data and too 
small when negative correlation is present. Ignoring the lack of inde- 

The two forms are exact equivalents, but the expression involving r requires 
much more computation. For the grapefruit data, using r « +0,677, ** 0,061, 

which agrees with the value for 
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pendence may cause us to fail to declare a significant difference 'When r 
is positive and to erroneously declare a difference to be significant -when 
r is negative. In most problems involving inherent pairing, the correla- 
tion will be positive, but occasional cases occur in which the correlation 
is negative. In any event, when inherent paking occurs, correlation 
between the two series is also almost certain to be present. The chance 
correlation that may appear between two series having Ni = N 2 and 
known to be independent is of no concern to us. 

CONCLUSION 

This chapter has made no attempt to contrast '' large-number methods 
and small-number methods.’^ The reason is that when cr is known, the 
normal curve is appropriate for samples of any size, large or small. When 
<r is not known, and when a is employed in its place, the t distribution (a 
^^small-number method ’0 is always the proper distribution to use. As 
n increases, the t distribution approaches the normal curve, so that for 
large samples the normal distribution is sometimes applied. However, 
even when n is large, the normal curve is an approximation. Sometimes, 
when a sample is large, s rather than d is used as an estimate of cr. The 
numerical difference between s and is slight for large samples, but the 
use of s as an estimate of cr should be avoided. 

Since the methods discussed in this chapter are just as applicable to 
small samples as to large samples, the question may arise: why bother 
to use large samples? The answer is that, when one makes use of large 
samples, a smaller observed difference X — X(p or — ^”2 is necessary 
to obtain significance at a specified probability level. This is true, (1) 
because &x and tend to decrease with an increase in sample 

size, while X — Xq^ and Xi — X 2 do not have a corresponding tendency 
to decrease, since they may either increase or decrease ; also, (2) because 
the t value required for the specified probability level decreases as n 
increases. Occasionally, as a result of using small samples, one may come 
to the conclusion that an observed difference is not significant, when, if 
large samples had been used, the difference (which itself would probably 
change) might have been significant. 

The tests discussed in this chapter undertook to ascertain whether 
statistical differences were or were not present. It is worth while to note 
that generic differences, as opposed to statistical differences, may exist, 
and that, when a generic difference is present, a statistical difference may 
or may not also be present. A generic difference is an actual difference 
in kind and may, for example, refer to males and females, railroad ties of 
different kinds of wood or preserved by different processes, or roofing 
nails made of copper or galvanized steel. The tests of yield points of 
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structural steel, referred to earlier in this chapter, are an illustration of a 
case where a generic difference and a statistical difference were both 
present; the steel from Source 1 was lighter-weight material than was the 
steel from Source 2. If tests were to be made of the reaction times of a 
group of rabbits and a"groiip of guinea pigs, it is quite possible that a 
statistically significant difference in reaction times might not be present 
although the two groups are generically different. 



Symbols Used in Chapter 25 

Part 1 : Proportions 

a: number of occurrences in a sample. 
ail number of occurrences in sample 1. 

^ 2 : number of occurrences in sample 2. 

a: lower-case Greek alpha; number of occurrences in a population. 

A: indicating an occurrence; A has no numerical value. 
b: number of non-occurrences in a sample. 

j(3: lower-case Greek beta; number of non-occurrences in a population. 

B: indicating a non-occurrence; B has no numerical value. 
k: number of samples. 

N : the number of items in a sample. 

Nil number of items in sample 1. 

Nzi number of items in sample 2. 
p: proportion of occurrences in a sample. 

Pki proportion of occurrences in the ^;’th sample. 
pii proportion of occurrences in sample 1. 
p^i proportion of occurrences in sample 2. 

p: an estimate of t based on two samples; a weighted average of pi and p^. 
P; probability; varies from 0 to 1. 

ir: lower-case Greek pi; proportion of occurrences in a population. 
Tii.the lower confidence limit of tt. ^ 

T 2 :'the upper confidence limit of x. 

q: proportion of non-occurrences in a sample, g = 1 — p. 
qii proportion of non-occurrences in sample 1. 
qzi proportion of non-occurrences in sample 2. 
qil - p. 

(Tai the standard error of a. 
cTj,: the standard error of p. 

estimated standard error of the difference between pi and p^. 
r: lower-case Greek tau; proportion of non-occurrences in a population. 
T =® 1 — X. 

“I a deviation divided by its standard error; for example, and 

cr 

a — tJV 

Part 2: The Chi-Square Test 
a: number of occurrences in a sample. 

ail number of observed frequencies in the upper left cell of a 2 X 2 table 
or, in general, in any 2 X ^ table. 

659 



660 


SYMBOLS USED IN CHAPTER 25 


[Chap, 25 


aa: number of observed frequencies in the second row of the first column 
of a 2 X 12 table; in the lower left cell of a 2 X 2 table. 
azi number of observed frequencies in the third row of the first column of 
a 2 X i? table. 

A: indicating an occurrence; A has no numerical value. 
b : number of non-occurrences in a sample. 

bii number of observed frequencies in the upper right cell of a 2 X 2 table 
or, in general, in any 2 X i2 table. 

62 : number of observed frequencies in the second row of the second 
column of a 2 X i2 table; in the lower right cell of a 2 X 2 table. 

63 : number of observed frequencies in the third row of the second column 
of a 2 X i2 table. 

B: indicating a non-occurrence; B has no numerical value. 

C: number of columns of observed frequencies (exclusive of totals) in a 
chi-square table which has its marginal totals set. 

/: an observed frequency. 
fci a computed frequency. 
n: degrees of freedom. 

N : number of items in a sample. For 2X2 and larger tables, AT is the 
number of items in the entire table. 

Na- number of frequencies (items) in the first column of a 2 X i? table. 
Nbi number of frequencies (items) in the second column of a 2 X B table. 
ATi, Ar 2 , JVs, • • • : respectively, number of frequencies (items) in the 
first, second, third, • • • row of a 2 X i? table, 
p: proportion of occurrences in a sample. 

Pii proportion of occurrences in sample 1 . 

P 2 ' proportion of occurrences in sample 2 . 

P: probability, varies from 0 to 1. 

tt: lower-case Greek pi; proportion of occurrences in a population, 

B: number of rows of observed frequencies (exclusive of totals) in a chi- 
square table which has its marginal totals set. 
the variance of a population, 
the estimated variance of a population, 
cr«: the standard error of a. 

(Tp: the standard error of p. 

S: upper-case Greek sigma; meaning 'Hake the sum of.^^ 

a deviation divided by its standard error, for example, - — 
cr CTp 

X^: chi-square. The symbol is a lower-case Greek chi. 

!: factorial For example, 4!«1X2X3X4, 
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In this chapter we shall consider Significance tests for dealing with pro- 
portions from random samples; we shall also give attention to certain 
aspects of the chi-square test. The reason for combining these two topics 
in one chapter lies in the fact that the test and the approximate tests- 
for proportions represent alternative methods of arriving at identical con- 
clusions. This will be clarified in the second part of the chapter. 

PART 1: PROPORTIONS 

The following discussion of proportions obtained from random samples 
will deal, first, with the significance of the difference between a sample 
proportion (p) and the proportion in the population (tt) when the propor- 
tion in the population is known; second, with the confidence limits of w 
when only p and N are known; and, finally, with the significance of the 
difference between the proportions of two random samples (pi and p^). 

Significance of the Difference Between p and t 

The exact test, it = 0.50. In a large assortment of marbles, half are 
black and half are white. The marbles do not differ from each other in 
any respect except color. Considering a black marble as an ^ ^ occurrence 
and a white marble as a non-occurrence” (that is, non-occurrence of 
black), and using x to indicate the proportion^ of non-occurrences in the 
population and r the proportion of occurrences, we have t = 0.50 and 

^ When the number of occurrences (a) and the number of non-occurrences 0) in 

a population are known > ic =« - and r = From these it is clear that 

x -f r « 1.0 and r « 1 — r. 
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f 5= 0.50. Suppose that a sample of 10 marbles is presented, which has 
9 black marbles. We have then: number of occurrences, a == 9; number 
of non-occurrences, 6 = 1; proportion of occurrences, p = 0.90; propor- 
tion of non-occurrences, q = 0.10. Note that 

' „ a a 

P = a + b ~ ¥’ 
b ^ 

a-+b~ N’ 
p + q = 1.0. 

Using P == 0.05 as a criterion, let us test the hypothesis that the sample is 
a random one from the population having tt = 0.50. 

Samples of = 10 can have a = 0, 1, 2, • • • , 10 and ir = 0, 0.1, 
0.2, • ■ • , 1.0, according to the expression 

(tB + 5r.4)‘®, 

where A and B, which have no numerical value, are used to indicate, 
respectively, an occurrence and a non-occurrence. Since ir — 0.50 and 
r = 0.50, 

irB+irAy^ = (0.50S -f- 0.50J^)“>, 

•= (0.505) >» + 10(0.505)^(0.604) 

+ 45(0.505)8(0.504)2 + 120(0.505)^(0.504)* 

-t- 210(0.505) «t0.604)^ + 252(0.505)*(0.504)* 

-f 210(0.505)^(0.504)® + 120(0.605) *(0.504)* 

-1- 45(0.505)2(0.504)* -|- 10(0.505) (0.504)* 

-I- (0.504)2®. 

Performing the indicated computations and placing the results in colum- 
nar form gives: 

Number of occurrences Proportion of occurrences 


of black balls 

of black balls 

Probability 

a 

V 


0 

0 

O.OOiO 

1 

0.1 

0.0098 

2 

0.2 

0.0439 

3 

0.3 

o.im 

4 

0.4 

0.2051 

5 

0.5 

0.2461 

6 

0.6 

0.2051 

7 

0.7 

o.im 

8 

0.8 

0.0439 

9 

0.9 

0.0098 

10*. 

1,0 

0.0010 



1,0000 
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From the foregoing, it appears that the probability of. obtaining random 
samples having 9 or 10 black marbles is 0.0098 + 0.0010 = 0.0108. This 
is represented by the two bars at the extreme right in Chart 25. L Since 
we have no reason to believe that the samples would always contain a 
larger proportion of black marbles than did thq population, we consider 
likewise the probability of one or no black balls, which is also 0.0108 and 
which is represented by the two bars at the extreme left in Chart 25.1. 


PROBABILITY 



Chart 25.1. Probability of Occurrence of Values of a and p in Samples of 
10 Whenir = 0.50. Obtained from the expansion of (0.50B + 0.50.4)“’ => 0.0010B“ 
+ 0.0098BM + 0.0439BM2 + 0.1172BU3 + 0.2051B«4< + 0.21615*4'’ +0.205 IBM* 
+ 0.11725*4" + 0.04395*4* + 0.009854* + 0.00104 »«. 

The probability of 9 or more and 1 or fewer black marbles is therefore 
0.0216. Using the criterion of 0.05, we reject the hypothesis that the 
sample was a random one from the population having w = 0.50. Remem- 
ber that, on the basis of this criterion, we would make Type I errors in 
5 per cent of our conclusions. 

If we had been using 0.01 as our criterion, we would not have rejected 
our hypothesis. Had we been employing 0.01 as our criterion and had 
we been concerned with samples having ten (or no) black balls, the proba- 
bility would have been 0.0020 and we would have rejected the hypothesis. 

An approximate test, t = 0.50. It has already been pointed out 
(pages 591-594) that the normal curve is the limit of the binomial as the 
exponent of the binomial approaches infinity. For practical purposes, 
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the nqrmal curve is often considered to be a reasonably good description 
of the binomial 

(0MB + 0.50A)^^ 

when N ^ 20. Chart 25.2 shows a normal curve fitted to (0.505 + 
0.50A)2Q. As we shall see Hter, the apparently good description of the 
binomial by the normal curve is no guarantee that the procedure involving 
the use of the normal curve will lead us to the same conclusion as the 
binomial. 

PROBABILITY 



0 2 4 0 e 10 12 14 16 IS 20 a 

0 ,10 .20 .30 .40 .50 .60 .70 .80 .90 1.0 |> 


Chari 25.2. Normal Curve Fitted to (0.50B -f- 

If the normal curve can be substituted for the binomial, we may com- 
pute the standard deviation of a sample percentage (Tp, ascertain the value 
of 

X _ p — IT 
(T ' 

and proceed as in Chapter 24 for testing ^ ~ X(p when <t is known. If we 
had a large number of sample proportions (pi, p 2 , Ps, * ‘ , pjt), all from 

random samples from the same population, we could compute the stand- 
ard deviation of those proportions from 


(pi — ?r)^ + (p2 — + * * * + (p^ — r)^ 

k 
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It is very unusual to have a large number of such p values, but it^can be 
shown^ that, when tt is known, the standard error of p from random 
samples is 

TT 

Y 

Alternative forms which are sometimes useful are 

7r(l — tt) /tt — 

Let^s see whether the approximate test will lead to the same conclusion 
as did the exact test for the marbles, where tt = 0.50, a = 9, p = 0.90, 
and N = 10. We first compute 

and then 

? — P _ 0-90 — 0-50 _ 0.40 _ o Ko 
a Up 0.158 0.158 

From Appendix H, which shows areas in two tails of a normal curve, %ve 
find that P = 0.0114. Although this value for P is smaller than the 
value of 0.0216, obtained by use of the binomial, our conclusion is the 
same; if 0.05 is our criterion, thejiypothesis is rejected. Note^ however^ 
that if OM had been used as the criterion, the exact method would tell us to 
accept the hypothesis while the approximate procedure indicates that the 
hypothesis should he rejected, 

A useful alternative form of the approximate test involves testing the 
significance of the difference beRveen a andTrW (the number of occurrences 
in the sample if the proportion of occui’rences in the sample were the same 
as in the population) by use of 

a; _ a — tN 

O' (Ta 

where® a a = VAVt. For our problem, 

cr, = Vl0(0.50)(0.50) = : 

and 

^ Q _ 9 — (0.50)10 
cr 0*0 L68 


..58, 

= 2.53. 




® See Appendix S, section 25.1. 

® See Appendix S, section 25.1 for a development of the expression for O'®. 
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This is, of course, the same - value as was obtained when p and w were 

cr 

compared. Tlje conclusion, too, is the same. The hypothesis is rejected. 

The fact that the approximate test guided us to the same conclusion as 
did the exact test, even though the probability given by the normal curve 
was incorrect, leads to an interesting question: When tt = 0.50, under 
what conditions may the normal curve be substituted for the binomial 
and the same conclusion be arrived at concerning a hypothesis? The 
answ^er depends on: (1) the size of the sample and (2) the criterion of 
significance which is being used. Since the probability resulting from use 
of the normal curve is always too srnallj^ when x = 0.50, the use of the 
p “ X (or a — ttN) test will never cause us to accept a hypothesis which 
the binomial would tell us to reject. Occasionally the p — x, or a — xiV, 
test will indicate the rejection of a hypothesis which the use of the 
binomial would show should be accepted. Consider the situation when 
X - 0.50, iV" = 60, a = 38 (p = 0.64), and the criterion is P = 0.05. 
■Using the binomial, it is found that the probability^ of obtaining a ^ 22 
or a ^ 38 is 0.052, and the hypothesis (that the sample is a random one 
from a population having x = 0.50) is accepted. Using the normal 
curve, the probability® is found to be 0.039 and would indicate that the 
hypothesis should be rejected! 

Yates^ correction. This correction was designed to be applied to the 
normal curve in order to increase the probability obtained from the use of 
the normal curve, so that the probability would be more nearly in agree- 
ment with the probability obtained by use of the binomial. If Yates^ 
correction is applied to the illustrative data just mentioned, the proba- 
bility^ is increased from 0.039 to 0.053 and the coxickision is the same as 


^ This will be seen to be the case for the various illustrations given in this text. An 
explanation is given in the reference mentioned in footnote 7. 

® The probability may be obtained from a table in H. G, Eomig, 50-100 Binomial 
fables j John Wiley and Sons, New York, 1953. 

®The computations are: 


X a — xiV 

Xts - - 

tr c. 


38-30 

- p ==== = 2.066. 

^60(0.50) (0.50) 


Referring to Appendix H, the value of F is seen to be 0,039. 

^ Yates’ correction is not explained in this text, since (for reasons which will later 
be clear) its use is not advocated. An explanation of Yates’ correction is given in 
F. F. Croxton, Blementury Statistics with Applications in Medicine^ Prentice-Hall, Inc., 
New York, 1953, pp. 254-256. 

For the type of problem under consideration, Yates’ correction involves computing 

- I 

— , where 1 1 means **take the absolute value,” and looking up the result- 
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if the binomial had been used. Note, however, that the use of 
correction has over-corrected^ that is, the probability is greater than that 
obtained by the binomial. This is important, since the use of the normal 
curve with Yates^ correction will sometimes result in the accepting of a 
hypothesis which the binomial (and the use o£ the uncorrected normal 
curve!) would indicate should be rejected. For example: t = 0.50, 
Y = 25, a = 4 (p = 0.16), and the criterion is P = 0.001. Using the 
binomial, the probability of obtaining a ^ 4 or a ^ 21 is found to be 
0.000 91. From the normal approximation, a value of P = 0.000 7 is 
obtained. Appljdng Yates^ correction, this value of P is increased to 
0.001 37. In this case, the uncorrected normal approximation agrees 
with the binomial, indicating that the hypothesis should be rejected. 
Applying Yates^ correction increases the probability to such an extent 
that the hypothesis would be accepted! 

A table for the exact test, when ir == OoSO. Extensive computa- 
tions of the sort just made, and referring to the 0.05, 0.02, 0.01, and 0.001 
levels, show that, while the use of the normal curve will ordinarily result 
in the same conclusion being arrived at as if the binomial had been used, 
this is not by any means always the case. In addition, the use of Yates' 
correction will sometimes result in over-correcting to such an extent that 
the conclusion to accept the hypothesis will differ from the conclusion 
based on the binomial. 

One possible solution may have occurred to the reader. That is, to 
make the a — irN test both with tod without Yates’ correction. When 
the two procedures lead to the same conclusion, that conclusion will be 
the same as if the binomial had been used. This is true because, as we 
already know, the a — tN test without correction results in a smaller P 
value than does the binomial, while the a — irN test with Yates’ correc- 
tion yields a larger P value than does the binomial. The difficulty with 
this solution is that contradictory conclusions do occasionally occur. ^ 
Whenever the two procedures result in different conclusions, resort must 
be had to the binomial. 

The best solution is to make use of the binomial whenever possible. 
Following procedures described before, it is not difficult to expand bino- 

ing figure In Appendix H, For the illustration above, 
la 13S ~ 301 - I 

« — 7= r ::r=:r = r = =:z==s r =* 1.9SG. 

\/60(0.50)(0.50) 

From Appendix H, JP =» 0.053. 

3 Another illustration: when using P « 0.05 as the criterion and with ir ** 0JO, 
N «= 100, and a »» 40. 
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mials^up to about iV = 20 or 30; but beyond that, the work becomes 
extensive. Books are now available from which one may read the values 
of the terms of ^binomials® for (1) = 2 to N — 49 by steps of 1, and (2) 

for iV* = 50 to Ar = 100 by steps of 5. Values of tt other than 0.50 are 
given, but at this point ^of our discussion we are interested only in ir = 
0.50. From these tables, Table 25.1 has been constructed, showing the 
value of a at various probability points and for selected values of N, 
With a table such as this available, one has no need to use the normal 
curve, with or without Yates’ correction, in order to avoid the labor of 
expanding a binomial. Neither is it necessary to expand a binomial, 
since Table 25.1 gives the results of such expansions. 

For samples having N > 100, the normal approximation will have to 
be used until some organization with extensive computing facilities can 
provide us with extended tables of binomials. 

The exact test, t 0.50, A cigarette company published the results 
of a ‘Hest” in which their product and those of three competitors were 
judged by eight physicians specializing in the treatment of the nose and 
the throat. Four of the 8 doctors indicated a preference for the com- 
pany’s cigarette, which %ve shall call brand No. 1; two preferred No. 2; 
none preferred No. 3; and 2 preferred No. 4. If there were no difference 
between the four brands, each would have an equal chance of being 
selected, so that the probability of brand No. 1 being preferred would be 
0.25. X = 0.25. Now, we wish to evaluate, in the expression 

(0.75B + 0,25 A) ^ 

the terms which include A\ A\ A\ and A®. As before, A indicates 
an occurrence — ^in this instance, a preference for brand No. 1, and B 
indicates a non-occurrence. 


® These are; (1) National Bureau of Standards, Tables of the Binomial Probability 
Distribution, Washington, 1949, and (2) H. G. Eomig, 60-100 Binomial Tables, John 
Wiley and Sons, New York, 1953. The symbols used in these references differ from 
those used in this text. The equivalences are: 

This text Reference (1) Reference ($) 


a r X 

iV. n. n 

Tt p p 


The reader is urged to remember that, when reversing cumulations of probabilities 
such as are given in these references, by taking one minus the cumulative probability, 
he must: (1) decrease^ the tabled a value by one when the original cumulation is of the 
more** type, as in the Bureau of Standards volume, and (2) increase the tabled a 
value by one when the original cumulation is of the less” type, as in the Romig 
book. 
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TABLE 25.1 

Values of a at Selected Lower and Upper Probability Points 
for Specified Values of N 
TV - 0,50 

Notes for the use of this table: (1) each a value shov'n for a lower probability point, 
together with all a values smaller than the one shown, has tfhe indicated probability or 
less; (2) each a value shown for an upper probability point, together with all a values 
larger than the one shown, has the indicated probability or less. 



P ^ 0.05 

P g 0.02 

P ^ 0.01 1 

P s 0.001 

N 

Lower 1 

Upper 

Lower 

Upper 

Lower 

Upper 

Lower 

Upper 

0.025 

0.025 

0.01 

0.01 

0.005 

0.005 

0.0005 

0.0005 


point 

point 

point 

point 

ppint 

point 

point 

point 

5 

6 

*0 

’ 6 







7 

0 

7 

0 

7 





8 

0 

8 

0 

8 

0 

8 



9 

1 

8 

0 

9 

0 

9 



10 

1 

9 

0 

10 

0 

10 



11 

1 

10 

1 

10 

0 

11 

b 

11 

12 

2 

10 

1 

11 

1 

11 

0 

12 

13 

2 

11 

1 

12 

1 

12 

0 

13 

14 

2 

12 

2 

12 

1 

13 

0 

14 

15 

3 

12 

2 

13 

2 

13 

1 

14 

16 

3 

13 

2 

14 

2 

14 

1 

15 

17 

4 

13 

3 

14 

2 

15 

1 

16 

18 

4 

14 

3 

15 

3 

15 

1 

17 

19 

4 

15 

4 

15 

3 

16 

2 

17 

20 

5 

15 

4 

16 

3 

i7 

2 

18 

21 

5 

16 

4 

17 

4 

17 

2 

19 

22 

5 

17 

5 

17 

4 

18 

3 

19 

23 

6 

17 

5 

18 

4 

19 

3 

20 

24 

6 

18 

5 

19 

5 

19 

3 

21 

25 

7 

18 

6 

19 

5 

20 

4 

21 

26 

7 

19 

6 

20 

6 

20 

4 

22 

27 

7 

20 

7 

20 

6 

21 

4 

23 

28 

S 

20 

7 

21 

6 

22 

5 

23 

29 

8 

21 

7 

22 

9 

7 

22 

5 

24 

30 

9 

21 

8 

22 

7 

23 

5 

25 

31 

9 

22 

S 

23 

7 

24 

6 

25 

32 

9 

23 

8 

24 

8 

24 

6 

26 

33 

10 

23 

9 

24 

8 

25 

6 

27 

34 

1 

10 

I 

24 

9 

25 

9 

25 

7 

27 

35 

1 

11 ! 

24 

10 

25 

9 

26 

7 

28 

36 

11 

25 

10 

26 

9 

27 

7 

29 

37 

12 

25 

10 

27 

10 

27 

8 

29 

38 

12 

26 

11 

27 

10 

28 

8 

30 

39 

12 

27 

11 

28 

11 

28 

8 

31 

40 

13 

27 

12 

28 

11 

29 

9 

31 

41 

13 

28 

12 

29 

11 

30 

9 

32 

42 

14 

28 

13 

29 

12 

30 

10 

32 

43 

14 

29 

13 

30 

12 

31 

10 

33 

44 

15 

29 

13 

31 

13 

31 

10 

34 

45 

15 

30 

14 

31 

13 

32 

11 

34 

46 

15 

31 

14 

32 

13 

S3 

n 

36 

47 

16 

31 

16 

32 

14 

33 

11 

36 

48 

16 

32 

15 

33 

14 

34 

12 

36 

49 

17 

32 

13 

34 

15 

34 

12 

37 

50 

17 

33 

16 

34 

16 

35 

13 

37 

55 

19 

36 

IS 

37 

17 

38 

14 

41 

60 

21 

39 

20 

40 

19 

41 

16 

44 

65 

24 

41 

22 

43 

21 

44 

18 

47 

70 

26 

44 

24 

46 

23 

47 

20 

50 

75 

28 

47 

26 

49 

25 

50 

22 

53 

SO 

30 

50 

29 

51 

28 

52 

24 

56 

85 

32 

53 

31 

54 

30 

56 

26 

59 

90 

35 

55 

33 

57 

32 

58 

29 

61 

95 

37 

68 

35 

60 

34 

61 

31 

64 

100 

39 

61 

37 

63 

86 

64 

33 

67 
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TaWe 25.2 shows the probability of each of the nine terms of the bino- 
mial. Adding the probabilities for the last five of the terms gives 0.1138, 
which is the probability of obtaining four or more favorable statements 
for brand No. 1 if the four brands are really alike. It is clear that brand 
No. 1 did not receive significantly more than one-fourth of the doctors’ 
votes. If the size of the sam'i)le had been larger, there might have been 
a significant difference in favor of brand No. 1. However, there is no 
reason to believe that if N were larger, p would still be 0.50. 


TABLE 25.2 

Probability of Each Term in the Expression (0*75B + 0,25Ay 


a 

Number of occurrences 
(number preferring 
brand #1) 

P 

Proportion of occurrences 
(proportion preferring 
brand #1) 

Expression 

Prob- 

ability 

0 

0 

(0. 76^)8 

O.iOOl 

1 

0.125 

8(0. 75B)’(0. 25.1) 

0.2670 

2 

0.250 

28(0.75B)®(0.25A)2 

0.3115 

3 

0.375 

56(0.76B)H0.25A)3 

0.2076 

4 

0 500 

70(0.75B)H0.25A)4 

0.0865 

5 

0.625 

66(0.755)H0.25A)s 

0.0231 

6 

0.760 

28(0. 755)2(0. 25A)s 

0.0038 

7 

0.875 

8(0.76B)(0.25A)’^ 

0.0004 

8 

1.000 

(0.25A)8 

0.0000 

Total 



1.0000 


Note that in the foregoing we considered only the last five terms of the 
binomial, the terms for which p — w ^ +0.25. We ignored the first 
term, which is the only one for which p — tt ^ —0.25. The reason for 
making such a one-tail test is that we were interested in knowing whether 
the preferences for brand No. 1 significantly exceeded it = 0.25. 

An approximate test, tt ^ 0.50. While at an Arabian horse ranch, 
the writer was told: ^‘All 30 of the mares had colts this season. This is 
unusual, as only 70 to 80 per cent ordinarily have colts in a single season.” 
Now iV = 30, a - 30, p = 1.0, and, considering tt to be 0.75, we are in a 
position to state just how unusual an occurrence this was. We merely 
need to evaluate the term which includes in the expression 

(0.255 + 0.75Ay% 

where, as before, A is an occurrence (birth of a colt) and B a non-occur- 
rence. That term has a probability of 0.000 18, or about 2 in 10,000, and 
is a very surprising occurrence, indeed. The ranch owner did not assign 
a reason for the surprising fecundity, but one would be Justified in reject- 
ing the hypothesis that the observed p of LO was based on a random 
sample from the population represented by his past experience. Note 
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that, again, we have made a one-tail test, since we wished tc\ know 
whether p = 1.0 significantly exceeded tt = 0.75. 

Let us see whether the normal curve can be used as a substitute for the 
skewed binomial. Since N = 30, the sample is fairly large. However, 
TT is 0,75 rather than 0.50, as was the case when the normal curve was used 
before. We compute 


(Tp = 

TTT 

/(0.75)(0.25) 

= 0.079 


\]\r ^ 

V 30 


X 

p — X 

1.00 - 0.75 

= 3.16. 

— =: 


0.079 

or 



X 

From Appendix G we find that a value of - = 3.16 cuts off less than 

0.000 97 but more than 0.000 69 of the area of a normal curve, in one tail. 
This approximate procedure yields a probability which is much larger 
than the exact procedure, but our conclusion concerning p is the same. 
This prompts us to raise a question which is similar to one raised earlier: 
When TT 5^ 0.50, under what conditions may the normal curve be sub- 
stituted for the binomial and the same conclusion be arrived at concerning 
the hypothesis? The problem is now more complex, since the answer 
depends on: (1) the value of r, (2) the size of the sample, and (3) the 
criterion of significance which is used. For our purposes it will be suiSB- 
cient to note, first, that when t ^ 0.50, the normal curve is a less satis- 
factory approximation to the binomial than when t = 0.50, for any 
given N. In fact, when:r 0.50, use of the normal curve will sometimes 
yield a probability that is too small and sometimes one that is too large. 
Second, Yates^ correction can be of no assistance, since it is not designed 
for situations in which ir ^ 0.50. 

Tables for the exact test when tt 5*^ 0.50. For situations in which 
TV 9 ^ 0.60, we need a series of tables, similar to Table 25.1, each table 
having to do with a different t value. Such an undertaking is too ambi- 
tious for an elementary text, and, in any event, the values of the terms of 
skewed binomials may be obtained from the two references cited in foot- 
note 9. For purposes of illustration, Table 25.3 has been prepared, 
dealing with the probability points for samples of various sizes when 
IT 0.20 or X = 0.80. 


Confidence Limits of X 

Sometimes the value of p is known, butx is not known, and it is impor- 
tant to state the limits within which x may be expected to occur. As was 
noted when discussing the confidence limits of we must first decide 



TABLE 25.3 


Values of a at Selected Lower and Upper Probability Points 
for Specified Values of N 

TT - 0,20 

Notes for the use of this table: (1) each a value shown for a lower probability point, to- 
gether with airOjValues smaller than the one shown, has the indicated probability or less, 
(2) each o value shown for an upper probability point, together with all a values larger 
than the one shown, has the indicated probability or less, (3) this table may be used 
when w ** 0.80 by reading If — a for a and reversing the lower and upper points. 



P ^ 0.05 

r g 0.02 

P ^ 0.01 

P g 0.001 

N 

Lower 

Upper 

Lower 

Upper 

Lower 

Upper 

Lower 

Upper 


0.025 

0.025 

0.01 

0.01 

0.005 

0.005 

0.0005 

0.0005 


point 

point 

point 

point 

point 

point 

point 

point 

3 


3 


3 





4 


4 

• 

4 


4 



5 


4 


4 


5 


5 

6 


4 


5 


6 


6 

7 


5 


5 


5 


6 

8 


6 


6 


6 


7 

9 


5 


6 


6 


7 

10 


6 


6 


7 


8 
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6 


7 


7 


8 
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7 
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9 

13 


7 


7 


8 


9 

14 


7 


S 


8 


9 

15 


7 


8 


S 


10 

16 


8 


8 


9 


10 

17 

0 

8 


9 


9 


10 
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0 

8 


9 


9 


11 

19 

0 

8 


9 


10 


11 

20 

0 

9 


9 


10 


12 

21 

0 

9 

0 

10 


10 


12 

22 

0 

9 

0 

10 


11 


12 

23 

0 

10 

0 

10 


11 


13 

24 

0 

10 

0 

11 

0 

n 


13 

25 

0 

10 

0 

11 

0 

12 


13 

26 

1 

10 

0 

11 

0 

12 


14 

27 

1 

11 

0 

12 

0 

12 


14 

28 

1 

11 

0 

12 

0 

12 


14 

29 

1 

11 

0 

12 c 

0 

13 


15 

30 

1 

12 

0 

12 

0 

13 


15 

31 

1 

12 

1 

13 

0 

13 


15 

32 

1 

12 

1 

13 

0 

14 


16 

33 

1 

12 

1 

13 

0 

14 


16 

34 

2 

13 

1 

14 

1 

; 14 


16 

35 

2 

13 

1 

14 

! 1 

1 15 

0 

17 

36 

2 

13 

1 

14 

1 

15 

0 

17 

37 

2 

13 

1 

14 

1 

i 15 

0 

17 

38 

2 

14 

1 

15 

1 

15 

0 

17 

39 

2 

14 

2 

16 

1 

i 

0 

18 

40 

2 

14 

2 

1 15 

1 

16 

0 1 

18 

41 

3 

14 

2 

16 

1 

16 

0 

18 

42 

3 

15 

2 

16 

1 

17 

0 

19 

43 

3 

15 

2 

16 

2 

17 

0 

19 

44 

3 

15 

2 

16 

2 

17 

0 

19 

45 

3 

15 

2 

17 

2 

17 

0 

20 

46 

3 

16 

2 

17 

2 

18 

1 

20 

47 

3 

16 

3 

17 

2 

18 

1 

20 

48 

4 

16 

3 

17 i 

2 

18 

1 

21 

49 

4 

17 

3 

18 

2 

18 

1 

21 

50 
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17 

3 
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2 1 

19 

1 

21 

55 
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18 

4 

19 1 

3 

20 

1 

23 
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19 

4 

21 

4 1 

21 

2 

24 

65 

6 

21 

5 

22 1 

4 

23 

3 

25 

70 i 

7 

22 

6 

23 

5 

24 

3 

27 

75 

8 

23 

6 

24 

6 

25 

4 

28 

80 

8 

24 

7 

26 

6 

27 

4 

30 

85 

9 

25 

8 

27 

7 

28 

5 

31 

m 

10 

27 

9 

28 

8 

29 

0 

32 

95 

11 

28 

9 

29 

9 

31 

6 

34 

100 

U 

29 

10 

31 

9 

32 

. 7 
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what confidence limits we want. Of course, the size of the sample from 
which p was computed must also be known. We shall proceed by con- 
sidering first an approximate method and then an exact method. 

An approximate method. After nearly 23 years of use, the Chicago, 
Milwaukee, St. Paul & Pacific Railway found that 22 out of 50 red oak 
ties, which had been preserved by means of creosote applied by the ‘^full 
ceir^ process, were still in good condition. For this sample, N = 50, 
a == 22, and p = 0.44. What are the 95 per cent confidence limits of tt? 
To obtain these two values, we employ the expression which has been 
used before 

X ^ p — T 
a <Tp 

but write it 


X 

cr 



We know p and N. From Appendix H or the last row of Appendix I, we, 

X 

obtain the - value (1.96) associated with the 95 per cent' confidence limits, 
or 

The three known values are substituted in the equation just given, and it 
is solved” for t, giving: , 


1.96 


3.8416 


0.44 — IT 

/tt - TT®’ 

> 50 


0.1936 - O.SStt + 

jT — ■jr® 


50 


The data are from Proceedings of the American Wood Preservers Associaiion, 19S5 
pp. 133-134. 

“The quadratic 0.1936 — 0.9568327r + 1.0768323r® is solved by computing 

-(-0.956832) ± V(0.956832)» - 4(0.1936)(1.076832) 

2(1.076832) 

If the first equation were to be written 

. a — tN 

1.96 x= 

VA(ir — v^) 

we would, initially, have only integers on the right. 
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3.8416T - 3.8416x2 


50 

0.07.0832X - 0.076832x2 
0.1936 - 0.956832X + 1.076832x2 


0.1936 — 0.88x + x*; 

0.1936 - 0.88x + x2; 

0 , 


' . 0.671125 , 1.242539 

^ g0 that 

2.153G64 2.153664 

TTi = 0.312 and7r2 = 0,577. 

What we did was to determine: (1 ) tti = 0.312, which is so located that 
p == 0.44 cuts off the upper 2i per cent tail of a normal curve around tti 

/ttiTi [(O.s'l^ii^ ^ 

witli (Tp = y \ 5Q 0.066, and 

0.44 cuts off the lower 2|- per cent tail of a normal 


(2) 7r2 “ 0.577, which 


is so located tliat p 
curve around 7r2 with 


? r2T2 
N 

25.3 illustrates what has been done. 




/ (0.577) (0.423) 
V “ 50 


0.071. Chart 



Cbarl 2.'5.:5. 95 P*ir IJiiuIh «f^, when p ~ 0.44 anil N = 50, 

Del er in intend by Use nf or,, anil Normal Curves. The eross-lmtchod area is 2,5 per 
cent of the left curve; the stippled area is 2.5 per cent of the right curve. 

The method just described gives satisfactory results when N is large 
and when p is not too far removed from 0.50. Its shortcoming will be 
apparent when we apply it to the following example. 

Standard-strength digitalis was injected into each of 20 frogs. As a 
result, 17 of them had rapid systolic standstills (they died). Other frogs 
were injected with half-strength digitalis and with digitalis alleged to be 
half-strength, but the results of those tests are of tio concern to us in (ion- 
nection with this example. For the group of frogs given full-strength 
digitalis, iV == 20 and p — 0.85. What are the 90 per cent confidence 

. . .X 

limits of -r? Proceeding as before, we first obtain the value of 1.645 

(T 
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from the last row of Appendix I, and then write 


1.645 


0.85 — IT 



which, when solved, yields 


Wi == 0.678 and W 2 = 0.938. 

These results seem all right until we look at Chart 25.4, which shows what 
we have done. Now, it is immediately apparent tliat the use of normal 



VALUES OF |> 


Chart 25.4. Unsalisfaclory Approximation of ihc 90 Per Cent Confi- 
dence Limits of ir when p = 0.85 and N ®= 20, Deleririincii by Use of <Tp and 
Normiil Curves. The cross-hatched area is 5 per cent of the loft curve; the 
stippled area is 5 per cent of the right curve. 

curves cannot be justified, particularly for determining T 2 . The normal 
curve at the right indicates that values of p > 1.0 would occur, which is, 
of course, impossible. 

The exact method. An exact determination of the confidence limits 
of T for the full-strength digitalis data requires a much more laborious 
procedure. Considering first the determination of tti, we must asoertain 
the value of w which, when inserted in the expression 

{tB - 7rA)2o, 

will result in u == 17 (p = 0.85) cutting off the upper 5 per cent tail of the 
binomial. This requires successive approximations, and we shall first try 
T » 0.65. From Table 25.4, it may be seen that, in the binomial 
(0.35S + 0.65ri)^®, the probability of obtaining a ^ 17 is 0.0444. Since 
this probability is less than 0.05, we must try a slightly larger value of r . 
In the same table, it appears that, when ir = 0.66, the probability of 
obtaining o g 17 is 0.0535. If two decimals are sufficient for tti, we 
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would conclude that the lower 90 per cent confidence limit of ir is 0.66, as 
shown in the upper part of Chart 25.5. In the event that three decimals 
are wanted for tti, we would note that the next value to be tried for wt 
should be larger than 0.655. A value of 0.657 was tried, with the results 
shown in the sixth and seventh columns of Table 25.4; for a ^17, the 

PROBABSUTY 



PROBABILITY 



Chart 25.5. 90 Per Cent Confidence Limits of w when IV «= 20 and 
o 17 (p == 0.85), Determined by Use of the Expression (tB + Data 

from Tables 25.4 and 25.5, 


probability is seen to be 0.0506. Trying, next, t = 0.656, it is seen from 
the table that the probability of a ^ 17 is 0.0497. The value of Ti lies 
between 0.656 and 0.657, but closer to 0.656 than to 0.657. 

In order to obtain 7r2, the upper 90 per cent confidence limit, we need to 
determine the value of tt which, when inserted in the expression 

(tB + 7rAy\ 

will result in a =« 17 (p == 0.85) cutting off the lower 5 per cent tail of the 
binomial. Since its was 0.938 by the approximate method, we shall first 
try w == 0.94. From Table 25.5, it is seen that a ^ 17 includes 0.1150 
of the binomial, and we next try ir = 0.95. This value fotTs results in a 
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probability of 0.0755 (see Table 25.5) for a S 17, so we proceed to try 
T = 0.96, which, as shown in Table 25.5, gives a probability of 0.0439 for 


TABLE 25.4 

Probabilities'^ and Cumulative Probabilities of Values of a in the Expression 
(tB -f when tt ~ 0.65, 0.66, 0,657^ and $.656 


(The probability of a ^17 is shown in boldface type.) 




0.65 

T = 

0.66 

7r = 

0.657 

TT = 

0.656 



Cumn- 


Cumu- 


Cumu- 


Cumu- 

(a) 

Proba- 

lative 

Proba- 

lative 

Proba- 

lative 

Proba- 

lative 


bility 

proba- 

bility 

proba- 

bility 

proba- 

bility 

proba- 



bility 


bility 


bility 


bility 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

0 

0.0000 

1.0000 

0.0000 

1.0000 





1 

0.0000 

>0.9999 

0.0000 

>0.9999 





2 

0.0000 

>0.9999 

0.0000 

>0.9999 





3 

0.0000 

>0.9999 i 

0.0000 

>0.9999 

j 




4 

0.0000 

>0.9999 

0.0000 

>0.9999 





5 

0.0003 

>0.9999 

0.0002 

>0.9999 





6 

0.0012 

0.9997 

0.0009 

0.9998 

Probabilities for a 0 to a « 16 

7 

0.0045 

0.9985 

0.0034 

0,9989 

are not needed for this problem. 

8 

0.0136 

0,9940 

0.0108 

1 0.9955 

i 




9 

0.0336 

0.9804 

0.0280 

0.9846 



i 


10 

0.0686 

0.9468 

0.0598 

0.9566 





11 

0.1158 

0.8782 

0.1056 

0.8968 



i 


12 

0.1614 

0.7624 

0.1537 

0.7913 





13 

0.1844 

0.6010 

0.1836 

0.6376 





14 

0.1712 

0.4166 

0.1782 

9.4540 





15 

0.1272 

0.2454 

0.1384 

0.2758 





16 

0.0738 

0.1182 

0.0839 

0.1374 





17 

0.0323 

0.0444 

0.0383 

0.0535 

0.0364 

0.0506 

0.0358 

i 0.0497 

18 

0.0100 

0.0121 

0.0124 

0.0152 

0.0116 

0.0142 

0.0114 

0.0139 

19 

0.0020 

0.0021 

0.0025 

0.0028 

0.0023 

0.0026 

0.0023 

0.0025 

20 

0,0002 

0.0002 

0.0002 

0.0002 

0.0002 

0.0002 

0.0002 

0.0002 


* The non-cumulative probabilities may be computed as in Table 23.8. When x consists of not 
more than two decimals, probabilities and cumulative probabilities may be obtained from National 
Bureau of Standards, Tables of the Binomial Probability Distribution, Washington, 1949. The cumu- 
lative figures shown above were obtained from the non-cumulative figures before the non-cumulative 
figures were rounded. 


a ^ 17. We conclude that7r2 = 0.96, and this is illustrated in the lower 
part of Chart 25.5. Values of w intermediate between 0.95 and 0.96 
could be tried, but we shall terminate the illustration at this point. The 
90 per cent confidence limits (to two decimals) are iri == 0.66 and 
^ 0.96. 

The exact method of determining the confidence limits of t necessitates 
two sets of trials for each different problem. Note that, in order to make 
a useful estimate of the values of iti and ir 2 which should be tried first, 
the approximate solution using <Tp should ordinarily precede the exact 
solution. If binomial tables, such as those mentioned in footnote 9, are 
available, the approximate solution may be omitted. 



TABLE 2S.S 


Probabilities* and Cumulative ProhahiUties of Values of a in the Expression 

when w = 0.94, 0.95« and 0*96 


(The probability of a ^ 17 is shown in boldfaoo type.) 



IT 

0.()4 

TT = 

0.95 

TT = 

0.96 



Cumu- 


Cumu- 


Cumu- 

a 

Proba- 

lative 

Proba- 

lative 

Proba- 

lative 


bility 

proba- 

r bility 

proba- 

bility 

proba- 



bility 


bility 


bility 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

0 

0.0000 

0.0000 

0.0000 1 

0.0000 

0.0000 

0.0000 

1 

0.0000 

0.0000 

0.0000 i 

0.0000 

0.0000 

0.0000 

• 


All omittftd probabilities are zero, to four decimals. 


12 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

13 

0.0001 

0.0001 

0.0000 

0.0000 

0.0000 

0.0000 

14 

0.0008 

0.0009 

0.0003 

0.0003 

0.0001 

0.0001 

15 

0.0018 

0.0056 

0.0022 

0.0026 

0.0009 

0.0010 

16 

0.0233 

0.0290 

0.0133 

0.0159 

0.0065 

0.0074 

17 * 

0.0860 

0.1150 

0.0596 

0.0755 

0.0365 

0.0439 

18 

0.2240 

0.3395 

0.1887 

0.2642 

0.1458 

0.1897 

19 

0.3703 

i 0.7099 

0.3774 

0.6415 

0.3683 

0.5580 

20 

0.2001 

1.0000 

0.3585 

1.0000 

0.4420 

1.0000 


* Seo footnote to Tabic 25.4. 
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To avoid the arduous labor of expanding a nuiAber of binomials, 
diagrams have been prepared by Clopper and Pearson which enable one 
to read the lower and upper 0.95 and 0.99 confidence limits of t. These 
are shown in Charts 25.6 and 25.7. 



VALUES OF p 

Chart 2.5.7. 99 Per Cent Confiticiicc jLimits of x for 

Values of p from Samples of Various Sizes from 10 to 
1,000. Reproduced, by permission, from 0. J. Clopper and 
E, S. Pearson, “The Use of Confidence or Fiducial Limits, ''Bio- 
melrika, \^ol. 26, p. 410. By correspondence Pearson advises that 
the TT values “are not completely accurate as the i{‘vels at cer- 
tain points wen* obtained by interpolation and not by direct 
caicufat.ion/' 

Sigiiificaiice of the RilTereiice Between pi and 

An approximate method. Reference was made earlier to 50 red oak 
tics which had been preserved by means of creosote applied by the *‘full 
cclF^ process. After 23 years of service, 22, or 44 per cent, of these ties 
were still in service. When these ties were laid, another group of 50 red 
oak ties, creosotediiaprcgnatcd by the Rueping process, were also put 
into use. Of this second group, IS ties, or 30 per cent, were still in 
service after the passage of 23 years. Now we have two samples: one, 

The data are from Proceedings of the American IVood Presermrs Association, 1935, 
pp. 133-134. 
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on which the “full cell” process was used, had Ni = 50, Oi = 22, and 
Pi = (J.44; the other, on which the “Bueping” process was employed, 
had = 50, 02 = 18, and pa = 0.36. We wish to know whether there 
is a significant difference, at the 0.05 level, between these two proportions. 

The procedure is essentially the same as that used for two sample 
means; we shall compare the difference to the standard error of the 
difference. The standard error of the difference between two percentages 
is 


-A 


ITT TFT 
¥1 


Now, we do not knowir, and, if we did knowTr, we would alnaost certainly 
wish to test pi against tt and p 2 against tt rather than to examine the 
significance of pi — p 2 . Since we do not know tt, we make an estimate, 
p, based on the information in both samples. Thus, 


“f* G2 
Ni -f- N2 
22 + 18 _ 
50 + 60 " 


0.40. 


Now we are in a position to compute 






Pg iB. 

Ni Ni 


(0.40) (0.60) ^ (0.40) (0.60) 


50 


50 


= 0.098, and 

^ ~ Pi _ 0-44 — 0.36 _ 0.08 

<r “ “ 0.098 “ 0.098 


0.82. 


Referring to Appendix H, it appears that P = 0.41, and we conclude that 
the difference between pi and pa is not significant. 

Exact method. When the two samples from which pi and pa are 
obtained are small, the approximate method Just described should be 
abandoned in favor of the exact method. Later in this chapter it will be 
shown that a chi-square test for a “2 X 2” table is identical with the 
pi — Pi test given above. At that point the exact test will be described. 
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PART 2: THE CHI-SQUARE TEST 

As we shall use it, in the present discussion, the consists of 

summing a series of ratios, each ratio having been obtained by : (1) taking 
the difference between an observed frequency (/) and an associated 
population or computed frequency (/c), (2)’ squaring this difference, and 
(3) dividing the squared difference by /c. Thus, 


X** = 2 


Sc 


In Chapter 26 we shall make use of a slightly different aspect of chi-square 
when we compare and 


The 1X2 Table 

Approximate method. To demonstrate the identity of the 
and the p — w (or a — tN) test, we shall use the example employed 
earlier in this chapter which involved a sample of 10 marbles, 9 of which 
were black. Using 0.05 as our criterion, we tested the hypothesis that 
the sample was a random one from a population having tt = 0.50 by use, 
of (Tp and also by use of <?■«. If we make the same test by means of 
we compute: 


Color of 
marble 

Observed number 
of marbles 
/ 

Computed 
number if 1 ; 1 
ratio exists 

fc 

f-fc 


/« 

Black ...... 

9 

5 

+4 

16 

3.2 

White 

1 

5 


16 

3.2 

Total 

10 

10 

0 


6.4 


This is a 1 X 2 table, since the observed frequencies occupy 1 column and 
2 rows. It is the simplest type of a one-column table. From the above 
table, the value of is seen to be 6.4, and we may determine the proba- 
bility of such a value of (or greater) by referring to the table of Appen- 
dix J for the appropriate number of degrees of freedom. For our problem, 
n = 1. This is so because a figure may be freely entered in one of the 
two boxes in the /-column. However, once this figure has been put down, 
the second figure is thereupon determined, since the total is 10. From 
Appendix J, when = 1 and == 0-4, the value of P is seen to be 
slightly larger than 0.01, causing us to reject .the hypothesis on the basis 
of this approximate test. If a more detailed table of x^ values were 


We can also obtain this probability by looking up x> iu the normal-curve 

table of Appendix H. 




HEIGHT Of 
OROINAtr 



VALUE OF 


Chart 25,S« Tine x® I>i«irO>iilion for n « I, n « 2, w = 5, aii<l it « 10. For 
descriptive legend see opposite page* 
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available, we would find P = 0.0114, exactly the same as for tke test 
involving (Xp (or cTo). As a matter of fact, the p — tt test (or the a ~ xAT 
test) and the test must produce the same final P value. . Note that the 

X 

~ value obtained for the p — t (or a — irN) test is the square root of the 
value. This can be seen in broader perspective if we look at the last 

X 

row of the ^-table (Appendix I), which gives *- values for the normal 

or 

distribution, and the first row of the table (Appendix J), which gives 
X^ values when n = 1 . For any given P value, the x^ value will be seen 
always to be the square of the normal value. 

The values of x^ shown in the first row of Appendix J are obtained from 
the distribution of x® for one degree of freedom, which is pictured in Chart 
25.8. 

The x^ test tells us the probability of getting a disagreement between 
observed and computed frequencies equal to or greater than that observed^ 
in either direction. For the marbles, the P value of a little more than 0.01 
represented the probability of 9 or 10 black marbles and of 9 or 10 white 
marbles. This is true even though only one tail of the chi-square dis- 
tribution (see Appendix J) is involved, because the / — /« values were 
squared. 

Exact method* Chi-square is an approximate test for the same 
reason that the p — tt (or a — wN'^ test was an approximate test; a con- 
tinuous distribution of sample values was assumed to exist, when actually 
only the eleven terms of the binomial (0.50P + 0.50 A) can occur. The 
exact procedure was set forth on pages 061-663 and it will not be repeated 
here. The approximate method, using x®, may be employed in place of 
the exact method, and the same conclusion arrived at, under exactly the 
same conditions that the p — tt (or a — wN) test may be used. These 
conditions were discussed for tt == 0.50 on pages 666 "668 and for tt 9 ^ 0.50 
on page 67 L 


Chart 25.S. The x® l>istril>utlon for = 1, » = 2, it — 5, aed n « 10* 
Note that different scales are used for the two parts of the chart. The ordinates wore 
computed from the expression 


-X* n-2 




which is not difficult to solve if logarithms are used. The mode of the x® distribution 
is at X® ” ^ — 2, except wh<m n « 1, and then the mode is at zero, as may be seen 
above; the mean is at x® ** n. As shown in the lower part of the chart, the skewness 
of the distribution decreases as the number of degrees of freedom increases. 
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Confidence limits of ir. As a matter of possible interest, it may be 
noted that ^nay be used to determine the confidence limits of t. The 
expression is 



and it is the exact equivalent of the approximate method given earlier. 

The 2X2 Table 

Approximate method. As will shortly be made clear, the test 
for a 2 X 2 table leads to the same probability, and therefore the same 
conclusion concerning a hypothesis, as does the pi — p 2 test described 
earlier. To clarify this point, we shall use the same illustration that was 
used for the pi — test. The data are now set up as in Table 25.6, 
which we call a 2 X 2 table because it has two columns and two rows of 
observed data. Two-column tables with more than two rows will be con- 
sidered later. 

There are no population frequencies in Table 25.6, but we obtain com- 
puted frequencies by noting that, if the ties treated by the two processes 


TABLE 25.6 

Railroad Ties in Use at Etwd oj 23-Year Test Period 
by Method Used to Apply Creosote Preservative 


Process by which 
creosote was 

In use at end 

of test period 

1 

Total 

applied 

Yes 

No 


Puli cell 

22 

28 

50 

Rueping 

18 

32 

50 

Total 

40 

’60 

100 


Data from Proceedings of the American Wood Preservers Association, 
1935, pp. 133-134. 


showed no difference in regard to the number in use at the end of the test 
period, we would expect the first box (Row 1, Column 1) to contain 
of the 50 ties treated by the full ceil process, and the second box (Row 1, 
Column 2) would be expected to have of the 50 ties treated by the 
same process. ^ In like fashion, the third box (Row 2, Column 1) would 
have of the 50 ties treated by the Rueping process and the fourth box 
(Row 2, Column 2) would have of the ties treated by this process. 
These /« values have been computed in Columns (2) and (3) of Table 
25.7. In Columns (4), (5), (6), and (7) of that table, the computation 
of is carried out and x^ = 0.67. A 2 X 2 table, with marginal totals 
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set, has n = 1, as will be explained in the next paragraph. Referring to 
Appendix J for = 1 and = 0.67 gives 0.30 < P < 0.50. A more 
detailed table of x® would show P = 0.41, the same as for the — ps 

tC 

test. Note, again, that the - value for the pi — pz test, which was 0.82 

O’ 

(or 0.816 to three decimals), is the square root of the value of 0.67. 

TABLE 25.7 


Computation of Data of Table 25,6 


Cell 

(1) 

Determination of computed 
frequencies 

/ 

(4) 

f-fc 

(5) 

(f-fcY 

(6) 

if -fcY 
A 

(7) 

Product of row 
and column 
totals 
(2) 

/« 

iCol. (2) -r- 100 

(3) 

Row 1, column 1 . 

50 X 40 - 2,000 

20 

22 

+2 

4 

0.20 

Row 1, column 2. 

50 X 60 - 3,000 

30 

28 

-2 

4 

0.133 

Row 2, column 1 . 

50 X 40 = 2,000 

20 

18 

”2 

4 

0.20 

Row 2, column 2. 

50 X 60 = 3,000 

30 

32 

. +2 

4 

0.133 

Total 

. • . 

100 

100 

0 


o.e? 


When the fa entries are not integers, they should be carried to one decimal in order that S/e will not ^ 
differ from 2^ by as much as X. Actually, only one of the /« figures in Column (3) must be computed. 
The others may be obtained by subtraction from the row and column totals of Table 25.6. 


That n = 1 for a 2 X 2 table with marginal totals set may be clarified 
by considering this small table: 




100 



150 

130 

120 

250 


which has the marginal totals given, but has no entries in the boxes. If 
a figure is entered in any one box, it should be clear that the figures for the 
other 3 boxes are thereupon determined. If 20 is written in the first box, 
then the figure for the second box must be 80, for the third box 110, and 
for the fourth box 40. Inasmuch as we were free to enter a figure in only 
one box, there is .but one degree of freedom. For tables larger than 2X2, 
the same method will tell one the number of degrees of freedom if the 
marginal totals are set. It is more expeditious, however, merely to 
compute 

n « (i? -- 1)(C ™ 1), 

where R is the number of rows and C is the number of columns. The 
following relationship may be of interest: 
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Degrees of freedom lost because of marginal totals (ig — 1) + (C — I) -f-l 

Degrees of freedom remaining, n {R — i)(C — 1) 

Total (number of boxes) -KC 


The computation form shown in Table 25.7 is not required when is 
computed for a 2 X 2 tg,ble. It was given here in order to clarify the 
procedure involved. The value of x* for a 2 X 2 table may be obtained 
more expeditiously by use of the expression, 

2 _ ~ 

^ ~ NiNiNal^b 

where the symbols refer to box and total frequencies as shown below: 


ai 

6. 

Nt 

Ui 




N, 

" N 


For the data of Table 25.6, 

[(22)(32) 


X' - 


(28)(18)]n00 


(50) (50) (40) (60) 
(704 - 504)n00 
(2500) (2400) 
4,000,000 


6,000,000 


0.67. 


This, of course, is the same value as obtained in Table 25,7. 

Exact procedure* When N is small, the probability given by the 
test is too small, with the result that the test might lead to a hypothesis 
being discredited, whereas the exact procedure might cause one not to 
discredit a hypothesis. 

Consider the following data dealing with two foms of treatment 
applied to 16 laboratory animals which had previously been inoculated 
with a virus. The figures for the two treatments appear so divergent 


Treatment 

1 Result 1 

Total 

Recovered 

Died 

#1 

7 

3 

10 

#2 

0 

6 

- JL 

Total 

7 

9 

16 


A degree of freedom is not lost because of every marginal total. If any one 
vertical and any one horizontal total (including the grand total) are deleted, they 
may be restored from the information given by the remaining totals. 
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that it may seem to the reader to be a waste of time to apply a statistical 
test. Nevertheless, using 0.01 as our criterion, let us see whether there 
is a significant difference between the two treatments. Our hypothesis 
is that the two groups, of 10 and G animals, arc from the same population 
in respect to the proportions recovered or died. Using first the chi- 
square test, we get 


(a 162 — bia^YN 


.[^)(G) -- (0)(3)F16 ^ 
(10)(G)(7)(9) 


7.47. 


Referring to Appendix J for = 1 , wc find P == 0.01 and, upon the basis 
of this approximate test, would conclude that our hypothesis was dis- 
credited. However, the probability is actually larger than indicated by 
the or than by the pi — test, which wc already know is the same 

as the tost for this type of problem. 

The probability of any arrangement of frequencies in the boxes of a 
2 X 2 tabic, with marginal totals set, may be obtained from 


NiW2lNaWil 

NlaiibMb/ 


Solving this expression for the data resulting from the two treatments 
gives 


10!6!7 !9!* 

i6!7!3!0!6! 


0.0105. 


This is the probability of the particular divergence which was observed. 
If any greater dilTcrences between the two samples (treatments) are 
possible, their probabilities must be added to this. (It will be remem- 
bered the x^ and the pi — p^ test give us the probability of a dilTcr- 
cncc equal to or greater than that which was observed.) The first column 
of Table 25.8 shows all the possible combinations that will produce the 
marginal totals of our problem. There arc seven in all From the 
second column it may be seen that none of the combinations shows a 
difference greater than and in the same direction as that which was 
observed. However, Combination VII shows a greater difference in the 
opposite direction. We therefore ascertain its probability, also, which is 
0.0009. Adding the two probalnlities for Combinations I and VII gives 
0.01 14 and leads us to a different conclusions^ from the one reached before: 
the hypothesis is not discredited. 


Drawing conclusions concerning 2X2 tables with small frequencies may he 
facilitated by use of a table, prepared by D. J. Finney and R. Latscha, which shows 
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TABLE 25.8 


Values of pi, p2> and pi — p2 and the Probability of Each of the Seven Com~ 
Mnations Yielding the Marginal Totals Shown Below 


Combination 

Proportion of row 
total in first 
colum-n and 
difference 

Probability of the combination from 

1 NilN^lNaimi 

NlailbiW.b.J 

7 

3 

10 

Pi = 0.7 


I 0 

6 

6 

P2 == 0 

0.0105 

7 

9 

16 

Pi - Pi = +0.7 


6 

4 

10 

pi = 0.6 


II 1 

5 

6 

P2 « 0.17 

0.1101 

7 

9 

16 

pi “* P 2 *= H“0.43 


5 

5 

10 

Pi = 0.5 


III 2 

4 

6 

P 2 = 0.33 

0.3304 

7 

1 9 

16 

Pi — 3?2 - +0.17 


4 

6 

10 

Pi » 0.40 


IV 3 

A 

6 

P 2 *= 0.50 

0.3671 

7 

9 

16 

Pi — P2 —0.10 


3 

7 

10 

Pi * 0.30 


V 4 

2 

6 

P2 « 0.67 

0.1573 

7 

9 

16 

pi - Pi ^ —0.37 


2 

8 

I 10 

Pi == 0.20 ^ 


VI 5 

1 

6 

P2 0.83 

0.0236 

7 

9 

1 16 i 

Pi — p2 — -^0.63 


1 

9 

10 1 

pi — 0.10 


yii 6 

0 

6 

Pi =* 1.0 

0.0009 

7 ' 

9 

16 

Pi — P2 - —0.9 


Total. 


1.0000 


As a matter of possible interest, Table 25.8 shows the probability of 
each of the seven combinations. Note that the seven probabilities add 
to 1.0000. Because of rounding, the seven figures shown in Table 25.8 
total 0.9999. 

If we had merely been interested in knowing whether treatment No. 1 
showed a larger proportion recovering than did treatment No. 2, we 

values of at significant at selected probabEity values when ai, iVi, and Nt are fixed. 
Provision is made for consideration of 2 X 2 tables ranging from Ni + N 2 =* 6 to 
JVi + iVa ™ 30. See E. S. Pearson and H. 0. Hartley, Biometrika TuMm for Sia- 
tisiiciami Cambridge University Press, Cambridge, England, 1954, pp. 65“*72 and 
1S8-193. The table originally appeared in two parts in Biometrika, Voi. 35, parts 1 
and 2, and VoL 40, parts 1 and 2. 
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%vould have halved the probability arrived at by tl^e ^^st. Tihis is 
^'less than 0.005’^ and involves the assumption that the distribution of 
possible values is symmetrical, which is not the case. The,correct proba- 
bility is 0.0105, the probability sho%vn in Table 25.8 for combination I. 

In a practical situation, what should one do if, handling data such as 
those for the two treatments, he is confronted by the conclusion which 
was just arrived at? Further experimentation is certainly in order; 
possibly larger samples may result in the appearance of a significant 
difference, or, alternatively, may still fail to discredit the hypothesis. 

Yates^ correction. This correction, previoush^ mentioned in connection 
with the a — irN test, may also be applied to the X“ for a 2 X 2 
table, when skewness is not present. The purpose is the same as 
before: to modify the approximate test so that the probability resulting 
from it will be in closer agreement with the exact test. Here too, Yates^ 
correction tends to over-correct.^^ For the data of the two treatments, 
the use of Yates' correction leads to a probability slightly larger than 
0.025, which greatly exceeds that obtained by the exact method. As 
stated before, the tendency to over-correct would sometimes lead us to 
the conclusion that a difference was not significant, whereas the exact 
procedure would indicate the presence of a significant difference. 

1 X jR Tables, Larger Than 1X2 

A 1 X 3 table. Freshness has been an advertised feature of various 
brands of coffee for many years. It'occurred to one concern to attempt to 
find out whether freshness really made any difference in the taste of 
coffee. To that end, a fairly comprehensive investigation was under- 
taken. One aspect involved 52 tasters, each of -whom was given 6 cups 
of coffee — 2 made from fresh coffee, 2 made from coffee 3 weeks old, and 
2 made from coffee 5 weeks old. The tasters were asked to match the 
duplicate cups. Now it is possible to make 15 different matchings of the 
six cups. Of these 15, only one involves a correct matching of all three 
pairs. There are six ways of having one pair correctly matched and 
eight ways of having no pairs correctly matched. It is not possible to 
match two pairs correctly. If no difference existed in the taste of fresh, 

The correction involves computing from the expression 

^ 7 . 

For purposes of computation, a simpler form is available. It is not given here because 
the use of Yates' correction is not recommended. 

See also “Yates' Correction and the Statisticians," by Frans Adler, in Journal 
of the American Statistical Associationj December 1951, pp. 490-501. 
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modei^tely stale, and stale coffee, we would expect the correct matchings 
of three, one, and no pairs to be in the ratio 1:6:8. Table 25.9 shows the 
observed data ^nd the frequencies computed on the basis of these pro- 
portions. From these two sets of figures, is found to be 46.08. Since 
the total is set and there are three categories of sample data,^® n == 2. 
(The distribution of ior two degrees of freedom is shown in Chart 25.8.) 
From Appendix J it may be seen that P is much less than 0.001, and it is 
clear that the matchings differ significantly from a chance distribution. 
Apparently it is possible to differentiate between fresh and stale coffee. 
A point worth noting, however, is that the data were so presented by the 
company that it was not possible to determine, when only a single pair 
was matched, how frequently the matching consisted of the two fresh 
cups, or the two cups made from 3-weeks-old coffee, or the two cups made 
from S- weeks-old coffee! Furthermore, the tasters did not identify the 
matched cups as 'Afresh, ^'moderately stale,^' and "stale.” 

Other 1 X i? tables. For tables having one column and more than 
three rows of observed data, the procedure would be similar to that shown 
for a 1 X 3 table in Table 25.9. The degrees of freedom would be i2 ~~ 1, 

TABLE 25.9 


Computation of Matching of Pairs of Cups of Coffee Made 

from Fresh, Three-W’^eeks-Old, and Five -Weeks -Old Coffee 


Number of pairs 
correctly 
matched 

/ 

i 

_,j 

fc 

1:6:8 

' f-fc 

(f-fcV 

fc 

Three 

15 

3 5 

+ 11.5 

132 25 

37.79 

One 

24 

20.8 

+ 3.2 

10 24 

0.49 

None 

13 

27.7 

-14.7 i 

216.09 

7.80 

Total'. 

^ 52 1 

i 62.0 

0 


46.08 


unless the / and /<, values had been made to agree in regard to more char- 
acteristics than just the total. Tables having one row and C columns are 
rarely encountered, because they are apt to be of unwieldy proportions. 
Such a table could be recast into a 1 X i? table. 

Test of **g 0 odness of fit” as a special ease of a 1 X IR table. In 
Chapter 23, a normal curve was fitted to data of baseball throws for 
distance by first-year high school girls. Columns (2) and (3) of Table 
25.10 show the observed data and the computed frequencies. From 
these two sets of figures, x^ is found to be 6.65. Now the observed and 
the fitted data have been forced to agree with each other in regard to Z", 
s, and N. Therefore, three degrees of freedom were lost. Since the 


Note that the expression (i? 1)((7 — 1) is not applicable to a 1 X -R table. 
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observed data are in 13 categories, we have n = 13 — * 3 = 10. The dis- 
tribution of for n = 10 is shown in Chart 25.8. From Appendix J it 
is seen that P is more than 0.75 but less than 0.80, and we conclude that 
the agreement between the observed and computed frequencies is satis- 
factory; we have no reason to doubt the hypothesis that the sample was 
a random one from a normal population. 

TABLE 25.10 


Chi-Square Test of Goodness of Fit for Normal Curve Fitted to Baseball 
Throws for Distance by First-Year High School Girls 


Distance in feet 

(1) 

/ 

observed 

frequency 

(2) 

! 

j expected 
frequency 
(3) 

/-/. 

...w, ... 1 

(/ -u- 

(5) 

f/ - 
h 

m 

Under 25 

1 i 

1.1 

-0.1 

0 01 

0.01 

25 but under 35 , 

2 i 

3.2 

-1 2 

1 44 

0.45 

35 but under 45 ! 

7 1 

9.1 

-2.1 

4 41 

0.48 

45 but under 55 

25 

20.2 

4 8 

23 04 

1.14 

55 but under 65 

33 

35 0 

-2.0 

4 00 

0 il 

65 but under 75 

53 

50 6 

2 4 

5 76 

0.11 

75 but under 85 

1 64 

57.4 

6.6 

J 43 56 

0.76 

85 but under 95 

44 

52.0 

-8 0 

64 00 

1.23 

95 but under 105 

31 

37.0 

-6.0 

30 00 

0.97 

105 but under 115 

27 

22.0 

5.0 

25 00 

1.14 

115 but under 125 

11 

10.2 

0.8 

1 0.04 

0.06 

125 but under 135 

4 

3.7 

0.3 

j 0.09 

0.02 

|35 or more 

1 

* 1.5 

-0.5 

! 0.25 ; 

0 17 

Total 

303 

303.0 j 

0 

! ... i 0 65 


Bata from Tables 23.1 and 23.3, 

To avoid the marked effect upon x® of small absolute differences betueen/and/c, which may occur 
in the end classes, it is not unusual to group several frequencies at one or both ends when making a 
test of “goodness of fit.” Because the distribution of / values around /c does not properly correspond 
to the expected distribution when /c is small, it has been recommended that no class should have fewei 
than 5 or 10 computed frequencies. However, it has been shown that, if the O.Oo criterion is being 
used, the end frequencies need not be this large. See W. G. Cochran, “The x* Correction for Con- 
tinuity,” Iowa State College Journal of Science, Vol. XVI, No, 4, July 1942, pp. 421 "430. 


2X3 and Larger Tables 

2 X R tablest For tables having two columns and R rows of observed 
data, it is not necessary to use a worksheet such as that in Table 25.7. 
Using the symbols to have the meanings indicated in the following table, 


ai 

bi 

Ni 

at 

h 

Nz 

az 

bz 

Nz 

• 



“VT"' 






692 


STATISTICAL SIGNIFICANCE II 


[Chap. 25 


the value of ntiay be computed from the expression 

N.NA\ffi ) A'l 

From information provided by selective service registrants examined 
for military service, sample data were obtained of the number of left- 
handed and right-handed registrants who were examined in the six army 
areas. The proportions of left-handed varied from 7.8 per cent in Area 
IV to 9.2 per cent in Area II. Applying a ^®st to the data of Table 
25.11 enables us to ascertain whether the proportions of left- and right- 

TABLE 25.11 

Number of Left-Handed and Right- 
Handed Registrants in a Sample* of 
Those Examined in Each of 
the Six Army Areas 


Army 

area 

Left- 

handed 

Right- 

handed 

Total 

I 

161 

1,636 

1,797 

II 

223 

2,195 

2,418 

III 

193 

2,130 

2,323 

IV 

' 137 

1,626 

1,763 

V 

1 230 

2,317 

2,547 

VI 

120 

1,191 

1,311 

Total 

1,064 

i 11,095 

12,159 


* The sample consisted of the records received by 
the Department of the Army on June 19, June 28, 
and June 30, 1952. 

Data from ‘‘Prevalence of Left-Handedness Among 
Selective Service Registrants,” by B. D. Karpinos 
and H. A. Grossman, Human Biology, Vol. 25, No. 1, 
pp. 36-49. 

handed differed significantly in the various army areas. From this table 
we compute 


_ (12,159)^ f (161)^ (223)= 

~ (1,064) (11, 095) (1,797 2,418 

= 3.98. 


(193)= (137)= (230)= 
2,323 1,763 2,547 

(120)= _ (1,064)= ] 
1,311 12,159 j 


In order to ascertain the number of degrees of freedom, we compute 
n = (R — l)(C — 1) = (5)(1) = 6. The distribution of x“ for n = 5 is 
shown in Chart 25.8. From Appendix J we find that P is between 0.50 
and 0.70, and we conclude that the proportions of left-handed and right- 
handed from the six areas are not significantly different. 
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For tables having C columns and two rows, the expression just used for 
may also be used, with appropriate changes of* symbols. AEerna- 
tively, the table may be rearranged into two columns. 

Tables having three or more columns and three or m’ore rows, with 
marginal totals set, are most expeditiously handled by means of a 
computation form such as Table 25.7. The ’degrees of freedom are 
(R - 1)(C - 1). 

When making chi-square tests, a very large probability may occa- 
sionally appear. Some writers have pointed out that a probability of 0.99 
is just as unusual as 0.01, and that, if we w^ere to consider 0.01 as dis- 
crediting a hypothesis, then 0.99 just as clearly discredits a hypothesis as 
does a probability of 0.01 . It is true that an occurrence having a proba- 
bility of 0.99 is just as surprising as an occurrence having a probability 
of 0.01, but it does not follow that a probability of 0.99 discredits the 
hypothesis.^® The startling agreement between sample and population 
or between two samples should lead us to look, more carefully than usual, 
for possibly ^'rigged data, for*arithmetic mistakes, for previous smooth- 
ing of the data if ^'goodness of fit^* is involved, or for a carelessly designed 
experiment. 

As a matter of fact, either extremely large or surprisingly small values* 
of P should cause us to re-examine the situation. Consider the following 
incident which was mentioned on page 12: When fluorescent lighting was 
first introduced, some persons believed that radiation from the lights 
would sterilize people. Hoping to^allay their fears, a railroad, which had 
already installed the lights, subjected one group of rats to incandescent 
light and a second group to fluorescent light. The first group had the 
usual number of offspring, the second group had none. This seemed, 
indeed, to reinforce the fears of those who thought that the fluorescent 
lights might sterilize. The result seemed so surprising that one executive 
asked that the second group of rats be carefully checked. Upon exami- 
nation, they were found to be all of the same sex. 

A discussion appears in ‘^Too Good to Be True,” by Alan Stuart, Applied iSto- 
iuiim, March 1954, pp. 29-32. 



Symbols Used in Chapter 26 
Variances 



G: the geometric mean. 
k: number of samples. 

L: the ratio of the geometric mean of several variances to their arithmetic 
mean. 

n: degrees of freedom. 

nij 7123 * ’ * : respectively, degrees of freedom in samples 1, 2, 3, • • * . 

Hk refers to the number of degrees of freedom in the fc^th sample. 

N : number of items in a sample. 

Ni,N 23 Nz 3 • * • : respectively, number of items in samples 1, 2, 3, • • • . 

Nk refers to the number of items in the fe^th sample. 

Nil used in connection with L to indicate the number of items in any one 
of several samples of equal size. 

Pi probability; varies from 0 to 1. 

5^: the variance of a sample. 

Sj! the variance of sample 1. 
si: the variance of sample 2. 

O'®: population variance. 

all the lower confidence limit of cr®. 

all the upper confidence limit of o'®. 

the estimated variance of a population obtained from a sample. 

5^1, $* 2 , 5*3, • • • : respectively, estimates of population variance from 
samples 1, 2, 3, • • • . al refers to the estimate from the fc^th sample. 
S: upper-case Greek sigma, meaning ^Hake the sum of.^' 
x: X — 

Xii a deviation of a value in sample 1 from Xi; Sx® - S(Zi — Xi)®. 

X 2 - a deviation of a value in sample 2 from X 2 ; Sxg = S(X 2 X 2 )®. 

Xi: the arithmetic mean of sample 1, 

X 2 : the arithmetic mean of sample 2. 

see Chapter 25. The symbol is a lower-case Greek chi« 

00 : infinity sign. 

Analysis of Variance 

F: the ratio of two estimates of 
hi the number of boxes. 


694 
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fcci the number of columns. 
kfi the number of rows. 
n: degrees of freedom. 

7 ii: degrees of freedom associated with the numerator of F. 
n^: degrees of freedom associated with the denominator of F. 

N: number of items in all rows, all columns, or all boxes. 

Nbi number of items in a box. 

Nci number of items in a column. 

Nri number of items in a row. 

Ni^ Nui Nz^ • • • : respectively, the number of items in Columns 1, 2. 
3 , • - . 

P: probability; varies from 0 to 1. 

N _ 

estimate of population variance using S(X 

1 

S: upper-case Greek sigma, meaning ‘^take the sum of.” 

kb 

S : a summation over the h boxes. 

1 

kc 

S: a summation over the kc columns. 

1 

hr 

S : a summation over the h rows. 

1 

N 

S: a summation over all items. §ame as S. 

1 

Nb 

S : a summation over the Nb items in a box. 

1 

S : a summation over the Nc items in a column. 

1 

Nr 

S: a summation over the Nr items in a row. 

1 

t: see Chapter 24. t == Vf when = 1. 

X: an observed value, 

X: the arithmetic mean of all the items, the grand mean.^^ 

Xh- the arithmetic mean of a box. 

Xci the arithmetic mean of a column. 

Xri the arithmetic mean of a row. 

Xi, Xtj Xzf • • * : respectively, the arithmetic means of Columns 1, 2, 
3 , • • * . 

X^: chi-square; see Chapter 25. — = F when 

n 
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Skewness and Knrtosis 

jSi: lower-case Greek beta; measure of skewness in a sample. See Chapter 

10 . 

^2- lower-case Greek beta; measure of kurtosis in a sample. See Chapter 

10 . 

N : number of items in a sample. 

Correlation Coefficients 

b: slope of the estimating equation Fc = a -f bX, 

F: a ratio between two estimated variances. 

Vy.x'' lower-case Greek eta; the square of the correlation ratio based on 
column means (see Chapter 20 ) ; sometimes referred to as the ratio of 
determination.^^ 

lower-case Greek eta; population estimate of 
m: number of constants in an estimating equation. For the correlation 
ratio riY.x^ m is the number of columns. 
n: degrees of freedom. 

rii and ^2*. respectively, degrees of freedom associated with the numerator 
, and the denominator of F. 

N: number of items in a sample. In two- variable linear or non-linear 
correlation, N is the number of pairs of items. In multiple or partial 
correlation, N is the number of sets of observations. 

Xi and N2: respectively, the number pf pairs of items from which ti and 
r2 were computed. 

P: probability; varies from 0 to 1 . 

r: sample coefficient of correlation, linear correlation of two variables. 

When twm samples are under consideration, we use ri and r2. 
r(p: population coefficient of correlation, linear correlation of two variables. 
r(p,: lower confidence limit of r(p. 

Tfpj upper confidence limit of r(p. 

P: estimated value of r|; obtained from a sample. 
ri3.2: coefficient of partial determination. See Chapter 21 . 
rL,23-**on»i}* ^ general form of the coefficient of partial determination for 
m variables. 

estimated population value of 

^i3.24j ^14.23- the thrcc forms of the coefficient of partial determination 
for four variables, when Xi is the dependent variable. 

coefficient of partial determination; the additional variation in Y 
explained by expressed as a proportion of the variation in Y which 
was unexplained by X. 
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coefficient of determination for X and Y, the estimating equation 
Yc = a + bX + cX^ having been used. 

^r.xx2’ population estimate of 

^Lts.xxs* coefficient of partial determination; the additional variation in 
Y explained by X®, expressed as a proportion of the variation in Y 
which was unexplained by X and X^. 

^f.xx2x»* coefficient of determination for X and F, the estimating equation 
Yc — a + bX + cX^ + dX^ having been used. 

^^y.xx^x^- population estimate of 

^1.23* coefficient of multiple determination; the proportion of variation 
in Xi which ^vas explained by X2 and Xz> 

•^1.234- coefficient of multiple determination; the proportion of variation 
in Xi which was explained by Xo, X3, and .Y4, 

^1.234- '-m- s, general form of the coefficient of multiple determination for 
m variables. 

■Si.234--*m- estimated population value of i2L234---m* 

Syi total variance of the F series. 

®F,x* square of the standard error of estimate for the estimating 
equation Yc — a + bX; unexplained variance. 

$■^1 estimated variance in a population. 

a-l: estimated population variance (total variance) of the F series. 

^r.x* population estimate of the unexplained variance resulting from use 
of the estimating equation Fc = a + 6X. 

(Tg: standard error of z, 

standard error of zi — ^2. 

S: upper-case Greek sigma, meaning ^Hake the sum of.’’ 
total variation in the Xi series. 

2x^1. 23- explained variation resulting from use of the estimating equation 
Xc 1.23 ~ ^^1.23 “h 612.3X2 + bu. 2 Xz- 

2x^1.234* explained variation resulting from use of the estimating equation 

X,i .234 “ <^^1,234 + 612.34X2 + 613.24X3 + 614.23X4. 

2xci.234.--m: a general form, explained variation resulting from use of the 
estimating equation X<;i.234...m ~ Ui.234.-.wi6i2.34 ‘--mX 2 “{■ 613.24.- •??! Xa 

+ 614.23... mX4 + • • • + 6lm.23.-.(m-l)Xwi. 

Sxfi,234...cm-i): explained variation resulting from use of the esti- 
mating equation Xcl.234-..(m-l) == ai.234... (m^n + 6l2.34...(m-l)X2 + 
6l3.24...Cm-l)X3 + 614.23... Cto-1)X4 + ’ * * + 6 i(to„i ).23 - - - (w-2)X(m~l). 

2x;i, 23: unexplained variation resulting from use of the estimating equa- 
tion shown for 

2x^1.234: unexplained variation resulting from use of the estimating equa- 
tion shown for 
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^ general form; unexplained variation resulting from use of 
the estimating equation shown for 207^1.234*- 

2 a;^i. 234 - ••(«-!)- unexplained variation resulting from use of the esti- 
mating equation shown for Sxci .234 ...(w-i)‘ 
total variation of the Y series. 

St/c'. explained variation resulting from use of the estimating equation 
Fc - a + bX. 

22/cf.zx®‘ explained variation resulting from use of the estimating equa- 
tion Fc = ex “h bX Hh cX^. 

explained variation resulting from use of the estimating equa- 
tion Vc a + bX + cX^ + dX\ 

Idyl: unexplained variation resulting from use of the estimating equation 

Fc = €5 + hX. 

unexplained variation resulting from use of the estimating equa- 
tion Fc - a + + cX2. 

^vIy.xx^x^* unexplained variation resulting from use of the estimating 


i: 


equation Fc = 
jr^N - m) 
\ 1 - ' 


G “h bJi “f* cX^ “f” dX^, 
or an equivalent expression (see note 15). 


may be 


either a two-variable linear coefBcient of determination or a partial 


coeflScient of determination. 


X , , Z — 0 Zi ^ Z2 

a deviation divided by its standard error; for example, or 

fJ ^ CTz ' O’zi’-zi 

X : an observed value in the X series; also, the X seiies. 

Xi, X 2 , Z 3 , X 4 , * • • : respectively, the Xi, X 2 , X 3 , X 4 , * * • series; also, 
observed values in those series. Thus, we may refer to correlating Xi 
with X 2 , X 3 , and X 4 , but 2 Xi means ^Hake the sum of the values in the 
^Xi series.’^ 

X: the arithmetic mean of the X series, 

y:Y^Y^ 

y^: Fc — F. See also Zyl and Zyl with additional subscripts. 
psi Y -- Fc. See also Zyl and Zyl with additional subscripts. 

F: an observed value in the F series; also, the F series. 

F : the arithmetic mean of the F series. 

Fct a computed Y value. 

1 *4" r 

z: 1.15129 log . When two samples are under consideration, we use 

1 ““ r 

Zi and Z 2 to correspond to ti and r 2 . 

1.15129 log 

1 - rff 

Z(P,; lower confidence limit of zy, 

8a>,: upper confidence limit of zg.. 



CHAPTER 26 


Statistical Signifrcahce III; 
Variances, Analysis of Variance, 
Measures of Skewness and Kurtosis, 
and Correlation Coefficients 


In this, the last chapter of the book, we shall give attention to variances 
computed from samples, the variance of several means (analysis of 
variance), values of j3i and jSs obtained from samples, and correlation 
coefBcients. 

VARIANCES 

Our consideration of sample variances, will parallel the treatment 
of arithmetic means and proportions in that we shall first consider the 
difference between and cr^; next we shall obtain confidence limits of ; 
and then we shall compare two sample variances. In addition, we shall 
give attention to one way of comparing several sample variances. 

Variances of random samples from a normal population are distributed 
neither normally nor symmetrically. Their distribution follows a skewed 
curve (skewed to the right), the exact shape of which depends upon cr^ 
and N, Sin< e tables giving values of for several values of P would 
have to have both and N as arguments, and would therefore be very 
extensive, it is fortunate that {N — 1)3‘® follows the chi-square 
distribution for iV — 1 degrees of freedom. Thus, we write 

, (AT -- 1)^^ 

i — 

In the event that is given, rather than we may obtain from the 
expression 
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Alternatively, we may apply the I'Sst in the form 


X* 


Ns^ 



with n = N — I for x®- 

Significance of the difference between and cK Below Table 
24.1 it may be seen that the value of for 10 pieces of hard-drawn copper 
wire was 75.73. In this case, as in most others, we do not know the value 
of (T^, but, for purposes of illustration, we shall assume that <r* = 46.42 
and test the hypothesis that — 75.73 is the variance of a random 
sample from a population having = 46.42. We shall use 0.05 as our 
criterion. Computing x^, we find 




(N - 1)^2 

(9) (75.73) 
46.42 


, 

^2 


14.683 


forn = iV — 1 = 9. From the x® Table of Appendix J, it is seen that, if 
0-2 = 46.42, the probability of obtaining = 75.73 or larger, for samples 
of 10, is almost exactly 0.10. Our hypothesis is not discredited. Note 
that, in this application, x^ has provided us with a one-tail test, since the 
probability which was obtained refers to values of d* equal to or larger 
than that observed. 

If we are interested in considering ralues of S’® which are less than o’®, 
more than one avenue of approach is open to us. We may ascertain the 
probability of a value of S® showing the same absolute difference, but in 
the opposite direction. That is, S® = 17.11. Alternatively, we may 
determine the value of S® which cuts off the lower 10 per cent tail of the 
distribution of x^ for n = 9. Considering these two, in turn, we find 
that, whenS® = 17.11, 


(9) (17.11) 
46.42 


3.317, 


and the probability is about 0.05 that values of S® equal to or smaller than 
17.11 would occur. The value of S® which cuts off the lower 10 per cent 
tail of the distribution of x® is obtained by using the x® value for P = 0.90 
when n = 9 in Appendix J. This is 4.168, and we write 


4.168 


9S® 


46.42 
9S® = 193.47856, 
S® = 21.50. 
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The fact that the test involves the ratio of to.a^ may have already 
suggested to the reader that, when n = 9 and when x^ = 14.684 (the 
value of x^ the upper 0.10 point), the resulting probability of O.iO may 
refer to any pair of values for o'- and cr^ giving the ratio 14.684 -j- 9 = 

L632. Whenever = 1.632, the value of will be at the upper 0.10 

point. In symbols,^ 

n a- 


and from this relationship the table of Appendix K was prepared. This 
table enables one to compute sampling limits of merely by dividing 
0*2 making it unnecessary to compute x^- For the preceding 

illustration, where 3'^ = 17.11 andcr^ == 46.42, the ratio is 0.3686. Look- 
ing up this ratio in Appendix K for = 9 gives a probability (lower point) 
of about 0.05, the same as obtained before. 

Confidence limits of We may also employ to obtain the 
confidence limits of cr^. For the data of hard-drawn copper wire, 3^ == 
75.73 and N = 10 . What are the 90 per cent* confidence limits of 
To answer this question, we use two chi-square values from Appendix J 
for n == 9: one at the upper 0.05 point and one at the lower 0.05 point (the 
0.95 point in Appendix J). These x^ values are 16.919 and 3.325, and 


we solve x^ = 


^2 


for 


and 


16.919 

16.919(r? 

^2 


(9) (75.73) 

5 J 

681.57, 

40.28, 


3,325 = 

3.325<rl =: 
<rl - 


(9) (75.73) 

<72 

681.57, 

205.0, 


The 90 per cent confidence limits of cr® are 40.28 and 205,0. As before, 
if we compute many such 90 per cent limits from random, samples from 
a normal population, out statements will include the population value 
90 per cent of the time and fail to include it 10 per cent of the time. 

^ The ratio ~ ~ is a special case of F (see page 720) when nt ^ oo. 



702 STATISTICAL SIGNIFICANCE III [Chap. 26 

Rodger# P. Doyle computed the 90 per cent confidence limits^ of for 
each of Shewhart^s 1,000 samples from a normal population. His limits 
included in 904 instances but did not do so for 96 of the samples. 

We may recast the expression 


to read 


X' 


A <> 

no*- 


0-2 


^2 


n 


to enable us to make a table from which to obtain the confidence limits of 
0*^ Such a table is given as Appendix L. Using it to get the 90 per cent 
confidence limits of o*^, when n = 9, which were just obtained by use of 
X^, we would compute 


and 


(x\ = 0.5319^2 = (0.5319) (75.73), 
- 40.28, 


dl = 2.707^2 = (2.707)(75.73), 
- 205.0. 


Significance of the difference between two sample variances. 
In Chapter 24 we considered the significance of the difference between the 
mean lengths of two sets of lower first molars which had Ai = 16, = 

0,72, ^2 = 9, and S2 = 0.62. We previously found that there was not-a 
significant difference between Xi and Z2. Using the 0.05 level as our 
criterion, let us now test the hypotheses that the two samples were from 
the same population in respect to {t^. 

When and ^*2 are independent estimates of from the same normal 

O'? 

population, their ratio 3^^ is distributed according to the F distribution 
0*2 

with ni == iVi — 1 and n2 = A2 1 degrees of freedom. If the 

value of F is 1.0. Values of F vary from 0 to 0.999 • • * when 0*1 < 
and from 1.000 * * • 1 to 00 when u? > a'l The F distribution is 
reverse — J shaped whenni = 1 orni = 2 and skewed to the right when 
m ^ 3. Several F distributions are shown in Chart 26.1. 

For the data of lower first molars we found, in Chapter 24, IjxI — 8.29 


,* From' unpublished material. 
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HEIGHT OF OROIMATE 



values of f 


Chart 26.1.' Distribution of F for ni =1, m *= 5; in — 2, n 2 = 5; and 
m 5, n 2 == 4, Horizontal and vortical scales extend to oo . The ordinates of the 
F distribution are obtained from the expression 


ni—2 



iuiF + n^) ^ 




with 71% = 15 and ^2 = 8. Values of F for selected values of ?ii and rio 
and for probabilities of 0.10, 0.05, 0.025, 0.01, and 0.001 in the right tail 
of the distribution are given in Appendix M. Referring to that appendix, 
we find that ni = 15 is not given, but 71% ~ 12 and ni == 24 are given, and 
so is = 9. It is not necessary to interpolate for ui = 15, since the 
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probability of F ^ 1.28 exceeds 0.10 whether we consider ni = 12 and 
712 = 8 or 111 = 24 and no = 8. The observed value of of does not signif- 
icantly exceed the observed value of of . But what about differences in 
the reverse direction? 

If of had been 0.432 with Ni = 16 and d| had been 0.553 with N2 = 9, 
then we would have 


F = 


fl 

5f 


0.432 

0.553 


= 0.781, 


with ni = 15 and no = 8. Now, the table of Appendix M does not include 
any F values smaller than 1.0. When a value of F is less than one, 
we can obtain the probability® of that F value or less by computing 


which will exceed 1.0, and reverse the degrees of freedom. 

F 

would look up 


That is, we 




1 

0.781 


1.28 


with ?2i = 8 and = 15. Doing this, we find that the probability of 
P g 1.28 when ni = 8 and 712 = 15 is more than 0.10; therefore, the prob- 
ability is also more than 0.10 for a value of F ^ 0.781 with r?] = 15 and 
n.> = 8. 

Comparison of several values of Sometimes it is important to 
know -whether uniformity exists betw&n several values of A pencil 
manufacturing concern made tests of the strength of the lead of their own 
pencils and of pencils manufactured by five of their competitors. The 
tests included five pencils of each hardness, 1, 2, 2.5, 3, and 4, from each 
of the six companies. Each individual pencil was tested four times. 

For five Number 2 pencils, made by a company which we shall call 
Company D,” the tests^ showed = 0.01316, 0.05667, 0*3 = 

0.02787, a'l = 0.01930, - 0.01529. Ni Nt ^ Nz ^ Na ^ 4. 

One way to compare these variances would be to compute F for and 
for al and $’3, and so on. Another procedure involves comparing all 
of the values at once by means of the measure^ L, sometimes referred 
to as a criterion of likelihood. 


® An abbreviated table, prepared by the authors of this volume and showing both 
upper and lower points, may be found in F. E. Croxton, Elementary Statistics with 
Applications in Medicinej Prentice-Hall, Inc., New York, 1953, pp. 334-'335, 

^ The test data are shown in Table 2^3, 

^See J. Neyman and B. S. Pearson, **On the Problem of k Samples,” Akademija 
Umiejetnosci, Bulletin International de VAcadknie Polonaise des Sciences et des Lettres^ 
S6ne A, Sciences Math<5matiques, 1931, pp, 460-4$ I . 
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X&lX • ■ • XH 

j , 

- (o’! + + • • • + ^fc) 


if Ni = Ni = 
items, 


■ • • = Nk. If the samples include varying numbers of 

<y X (ff?)"* X • • • X 

^ = 1 * 

~ {ni&\ + n2&\ + • • « -f njb^^l) 

n 


where n = ni + n 2 + ‘ ^ The numerator is the geometric 

mean of the while the denominator is the arithmetic mean of the 
We already know (Chapter 9) that the geometric mean of a series of 
values, ■which are not all the same, is smaller than the arithmetic mean of 
those values. Also, the more divergent the values, the greater the 
difference between G and Now, if o'! =* ^2 - * • * = a condition 
of maximum uniformity obtains, and the value of L is LO, If there is any 
difference between the o’^'s, the value of L will be less than LO, approach- 
ing 0 as its lower limit. L = 0 represents a condition of maximum 
non-uniformity and is a theoretical limit which would not be approached* 
in actual practice. 

Computing L for the five Number 2 pencils made by Company D gives 


->^0.01316 X 0.05667 0.02787 X 0.01930 X 0.01529 

i(0.01316 + 0.05667 + 0.02787 + 0.01930 + 0.01529)’ 
0.02278 


0.02646 


= 0 . 86 . 


It would appear, since 0.86 is not far removed from 1.0, that uniformity 
exists among the five values of However, we want to know whether 
L = 0.86 differs significantly from LO. The h 3 rpothesis to be tested is 
that the five variances were from random samples from the same popula- 
tion in regard to cr^. The distribution of L, fotr samples drawn from a 
normal population, is J-shaped, as shown by the small chart above 
Appendix N. This appendix gives values of L at the 0.05 and 0.01 points 
for various values of Ni and where Ni refers to the number of items in 
any one of the samples of equal size. For our problem, Ni = 4 and 
A: = 5, and, from Appendix N, it is seen that L — 0.491 is at the 0.05 
point while L = 0.370 is at the 0.01 point. It is clear that the observed 
value of L = 0.86 does not differ significantly from 1.0; the hypothesis is 
not discredited. 

Values of h were computed for the variances of Number 2 pencils made 
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by each of the other five companies. In one instance, L == 0.30 with 
Ni = "4 and fc = 5 as before. This value for L is beyond the 0.01 point 
and would be considered significantly different from 1 . 0 . 

ANALYSIS OF VARIANCE 

In Chapter 24 we considered the significance of the difference between 
two means. The discussion of analysis of variance, which follows, deals 
with t'wo or more means. In its simplest aspect, analysis of variance will 
have to do with two independent estimates of which will be compared 
with each other by means of F. 

One criterion of classification. In Table 26.1, data are shown of 
the length of eggs of the European cuckoo found in the nests of three 
other species of birds. The European cuckoo makes a practice of per- 
mitting other birds to hatch its eggs and rear its offspring. We are 
interested in knowing whether the mean lengths of cuckoo eggs found in 
the nests of the hedge-sparrow, the robin, and the wren are significantly 
different from each other. We shall not compare the first mean with the 
second, the first with the third, and the second with the third. We shall 
consider the three means as a group, comparing the estimated variance of 
.those three means (one estimate of the variance in the population) with 
the estimated variance within the three columns (a second estimate of 
the variance in the population). 

The data of Table 26.1 are classified according to one criterion: the 
species of bird in which the cuckooes eggs were found. For such a table, 
there are three sources of variation. 

1 . Variation between column means. The variation between column 
means is obtained by taking the differences between each column mean 
(Xi, X 2 , Z 3 , ' • ') and the ^‘grand mean^^ (Z, the arithmetic mean of all 
the values), squaring each difference, multiplying each squared difference 
by the number of items in the appropriate column {Ni, V 2 , Va, • * 
and summing. Symbolically, this is 

Ni{ii - ly + ^2(^2 - xy + AaCis f )® + — • • 


Using Xe to indicate a column mean, Nc the number of items in a column, 
and kc the number of columns, variation between column means may be 
written 

- lyi 

1 

where S indicates that a summation over the h. columns is to be made. 
1 

The expression just given calls for the computation of k. column means 
and the grand mean. This is not necessary, as it is shown in Appendix S, 
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TABLE 26.1 

Computation of Values Required for Analysis of Variance of Dqta 
of Length of Cuckoo’s Eggs Found in the Nests of Three Species 

of Birds 


Hedge-sparrow 

Robin 

j Wren 

Xi 

XI 

Ys 


. Xs 

XI 

22.0 

484.00 

21.8 

475.^4 

19.8 

392.04 

23.9 

571.21 

23.0 

529.00 

22.1 

488.41 

20.9 

436.81 

23.3 

642.89 

21.5 

462.25 

23.8 

566.44 

22.4 

501.76 

20.9 

436.81 

25.0 

625.00 

22.4 

501.76 

22.0 

484.00 

24.0 

576.00 

23.0 

529.00 

21.0 

441 . 00 

21.7 

470.89 

23.0 

529.00 

22 3 

497.29 

23.8 

566.44 

23.0 

529.00 

21.0 

441.00 

22.8 

519.84 

23.9 

571.21 

20.3 

412.09 

23 1 

533.61 

22.3 

497 29 

20.9 

436 . 81 

23.1 

533.61 

22.0 

484 00 

22.0 

484.00 

23.5 

552.25 

22.6 

1 510.76 

20.0 

400.00 

23.0 

529.00 

22.0 

484.00 

20.8 

432.64 

23.0 

529.00 

22.1 

488.41 

21.2 

449.44 



21.1 

445.21 

21.0 

441.00 



23.0 

529.00 



323.6 

7,494.10 

360.9 

8,147.53 

316.8 

6,698.78 


Data from Oswald H. Latter, "The Egg of Cuculus Canorus,” Biometrika, Vol. 1, p. 
173. 

N « 45 


kc 


1 


SZ - 323.6 -f 360.9 + 316.8 « 1,001.3. 

(S’X)2 « (1,001.3)2 « 1,002,601.69. 

2X2 7,494.10 -f- 8,147.53 + 6,698.78 = 22,340.41. 



(323.6)2 (360.9)2 (316.8)2 

14 16 15 


22,311.1495. 


section 26 . 1 , that® 


S[iv.(X - 



N 


® If iV*! «a Xa =* Xa * • * > tile expression 




may be written 
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N. 

where, S refers to a summation of the JV* items in a column and N - 
1 

Ni + Ni + Nz. From the computations shown below Table 26.1, 


h, 

S 

1 



N 


22,311.15 - 


1,002,601.69 

45 


= 22,311.15 - 22,280.04, 
= 31.11. 


2. Variation within columns. Variations within columns is the varia- 
tion of the values in the columns from the column means. It is obtained 
by taking 'the difference between each item in a column and the column 
mean, squaring the differences, summing the squared differences for the 
column, performing the same operations for the other columns, and 
summing the sums for the columns. Symbolically, variation within 
columns is 


k. 

S 

1 




'This expression involves the computation of kc column means and the 
determination of N differences. These operations are unnecessary, since 
Appendix S, section 26.2 shows that 

k. rv. -| ' k, 

S I S(Z - = 2X^-2 



and, again referring to the computations below Table 26.1, we find 

22,340.41 - 22,311.15, 

29.26. 

3. Total variation. Total variation is the sum of the squared deviations 
of all the values from the grand mean. It is the same as Ns^, where s is 
the standard deviation, which was explaiiied in Chapter 10. Symboli- 
cally, total variation is 

N 

2(X - 1)». 

i 

It is not necessary to obtain the N deviations called for by this expression, 
since, by a procedure similar to that shown in Appendix S, section 10.2, 
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it may be shown that 

i{x - ly = sz* - iML*. 

1 N 


For the cuckoo-egg data, 


SZ" - 


(SZ)** 

N 


= 22,340.41 - 


1,002,601.69 

45 


= 22,340.41 - 22,280.04, 
= 60.37. 


Notice that the sum of the first two values which we obtained equals the 
third value. That is: variation between column means -I- variation 
within columns = total variation. This is true for all problems such as 
this, since 




+ 



= SZ* - 


(SZ)® 

N 


As will be seen later, no use will be made of the numerical value for 
total variation. Nevertheless, it is well to compute it as a check on the 
other values. 

Estimated variances. It is our objective to compare the estimated 
variance between column means ’with the estiniated variance within 
columns in order to ascertain whether the column means differ more than 
might be accounted for by chance. The estimated variance within 
columns is our yardstick of chance variance, since the variation of the 
items in the columns is not affected by differences between Xi, ’Xt, 
. Estimated variance is obtained from variation by dividing 
variation by the appropriate number of degrees of freedom. For our 
problem, estimated variance between column means has n = 2, since the 
deviations of the three column means were taken from X. For estimated 
variance within columns, n = Ari — 14 -Ar* — l+JV* — l==14 — l-(- 
16 — 1 •+• 15 — 1 = 42, since the deviations in each column were taken 
from the column mean. 

The computation of the estimated variances is indicated in Table 26.2, 
and from these we get 


P 


15.56 

0.6967 


22.3, 


with «i «= 2 and wj = 42. The P table of Appendix M does not contain 
a row for = 42, but it is, nevertheless, clear that the probability of 
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getting F ^ 22.3 is much less than 0.001, and we conclude that there is a 
real difference between the mean lengths of the eggs found in the nests of 
the three species of birds. ^ It is of interest that later non-statistical 
investigations revealed that European cuckoos exhibit what is known as 
host specificity which means that ‘^different tribes, or gentes, exist 
within the species, even in the same area, each adherent to a different 
host species and each specialized in at least one respect for that one 
species.’^ 

TABLE 26.2 


Summary of Computations for Analysis of Variance of Bata of 
Length of Cuckooes Eggs 


Source of variation 

Amount 

of 

variation 

i Degrees 
of 

freedom 

Estimated 

variance 

Between column means 

31.11 

2 

15.66 

Within columns 

29.26 

42 

0.6967 

Total 

60.37 

44 



The hypothesis which we tested was that the estimated variance 
"between column means and the estimated variance within columns were 
from the same population with respect to <r^. The hypothesis was dis- 
credited. If a sample is drawn from a normal homogeneous population, 
we could expect the two estimated variances just mentioned and (an 
estima-^e based on total variation) to be equally good estimates of 
But if heterogeneity is present, as it was in our illustration, the estimated 
variance between column means and are both affected by that hetero- 
geneity. Estimated variance within columns is not affected, and there- 
fore provided our measure of chance variance. 

The P test for the data of length of cuckooes eggs involved a situation 
in which ni = 2 and ng = 42. If we had had two columns of observed 
data in Table 26.1, instead of three columns, Ui would have been 1 and our 
problem would have been that of testing the significance of the difference 
between Xi and ^ 2 , which was considered in Chapter 24. In fact 
whenever an estiinated variance has ni == 1 in an F test, the t test is an 
alternative which yields the same probability. This will be clear if wo 
look at Appendices I and M. From these it may be seen that, for any 
given probability, the value for is the same as the value for F when n 
for t equals n% for F and when ni for F is 1. An instance in which the 

’’.L. H. C. Tippett comes to the same conclusion using data of cuckooes eggs in the 
nests of six species of birds. See ids The Methods of StcdisUcs^ Williams and Norgate, 
Ltd., London, 1937, 2nd Ed., pp. 132-134 

^ See **Social Parasites Among Birds,” by Alden H. MiUer, The Sdmtiik Monthly^ 
?oL.LXn, p. 243. 
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^-test could be used in place of F occurs in tbe test qf'the estimated vari- 
ance between column means shown in Table 26.6. 

Two criteria of classification, one entry in each box. The data 
of Table 26.1 had but one criteria of classification, the type of nest in 
which the cuckoo’s eggs were found. In Table 26.3 there are two criteria 

TABLE 26.3 

Computation of Values Required for Analysis of Fariance of Data of Strength 
of Lead in Number 2 Pencils Manufactured by Company l>” 

A, Observed data, in kilograms, and sums. 


Location 
of test 
on pencil 

Pencil 1 

Zi 

Pencil 2 
Xz 

Pencil 3 
Xa 

Pencil 4 

X4 

Pencil 6 
Xs 

Nr 

sx 

1 

iv)’ 

I 

1.82 

1.70 

1.70 

1.82 

1.92 

8.96 

80.2816 

II 

1.56 

1.36 

1.68 

1.98 

1.86 

8.44 

71.2336 

III 

1.78 

1.54 

2.02 

1.82 

1.64 

8.80 

77.4400 

IV 

1.74 

1.92 

1.92 

1.64 ; 

1.75 

8.97 

80.4609 

N, 

sx 

1 

6.90 

6.52 

7.32 

j 

i 

7.26 

7.17 

35.17 

2X 

309.4161 
kr /N, \ 


Data from tests of pencils of various brands conducted in 1934 for the Eagle Pencil Co. 


B. Squares of observed data and sum. 


Location 
of test 
on pencil 

i 

1 

Z| 

XI 

• 

z* 

X\ 

Total. 

I 

3.3124 

2,8900 

2.8900 

3.3124 

3.6864 

16.0912 

11 

2.4336 

1.8496 

2.8224 

3.9204 

3.4596 

14.4856 

III 

3.1684 

2.3716 

4.0804 

3.3124 

2.6896 

16.6224 

IV 

3.0276 

3.6864 

3,6864 

2.6896 

3.0626 

16.1525 

Total 

11,9420 

10.7976 

13.4792 

13.2348 

12.8981 

62.3617 


iVr^ « 4, iV^r « 5, N ^ 20. 

(SX)® » (35.17)2 « 1,236.9289. 

I + (7.32)> + (7.26)* + (7.17)“ «= 247.8193. 

of classification: (1) the different pencils, of which there were five, and 
(2) the location on the pencil where the test was made, of which there 
were four for each pencil. Each pencil was sharpened and tested, then 
sharpened again and tested, and so on. It is conceivable that changes 
in location may be associated with a progressive increase or decrease of 
strength of the lead. 

Table 26.3 has 5 X 4 = 20 boxes® or cells of observed data, in each of 

® The term **box^* is used iif this text, since we have already used X to indicate 
the mean of a coknm and shall later use to indicate the mean of a box. 
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which there is but a single entry. We shall see later that it is desirable 
to have more than one entry in a box, if that is possible. However, there 
are some situations, such as the present one, in which only one entry is 
possible. We could include more pencils or we could test each pencil 
at more locations, but we could not have more than one test at a given 
location on a pencil. 

For the data of Table 26.3, we have variation between column means 
and total variation, as before. However, there is no variation within 
columns, but instead, there is variation between row means and a residual 
variation representing a difference between (1) total variation and (2) 
variation between column means plus variation between row means. 
We shall first compute each of these variations. 

Total variation. The expression is the same as that previously used, 
and for the data of 26.3, we have 


- 


(zxy 

N 


= 62.3517 - 


1,236.9289 

20 


= 0.505255. 


Variation between column means may also be obtained by use of the 
expression used before, but, as pointed out in footnote 6, it may be 
slightly simplified when the number of items in the columns is the same. 
For the pencil data, 

ife. /Nt \ 2 

? vf / 247.8193 1,236.9289 

N, AT ~ 4 20 ’ 

= 0.108380. 

Variation between row means. This concept is the exact parallel of that 
just given. Using the following symbols, 

Xr, the mean of a row, 

JVr, the number of items in a row, 

kfj the number of rows, 

Nr 

S, a sum over the Nr items in a row, and 

1 

hr 

S, a sum over the kr rows, 


and remembering that the number of items in the rows is the same, we 
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have 

(SZ)2 _ 309.4161 _ L236.92S9 
Nr iV ~ " 5 20 ’ 

- 0.0367J5. 

Residual variation. The sum of the variation between column means 
and the variation between row means is less than total variation. This 
difference, which is 

(0.505255) - (0.108380 + 0.036775) = 0.360100, 

is ordinarily referred to as ^'residual variation,^’’ since it is usually com- 
puted as a residual. It is possible to compute this value directly by 
means of the expression 

S(X + J - Ir - lc)\ 

For the data of Table 26.3, this time-consuming computation gives 
0.360100, the same value as was obtained as a residual. 

Estimated variances. Table 26.4 summarizes the foregoing results and 
shows also the number of degrees of freedom and the estimated variances, 

TABLE 26.4 


Summary of Computations for Analysis of Variance of Data of 
Strength of Lead in Pencils 


Source of variation 

Amount 

of 

variation 

Degrees 

of 

freedom 

Estimated 

variance 

Between cohimn means 

.. 0.10838Q 

4 

0.027095 

Between row means 

. . 0.036775 

3 

0.012258 

Residual 

0.360100 

12 

0.030008 

Total 

, . 0.505255 

1 19 



Since there are five column means, the variation of which was computed 
around variation between column means has four degrees of freedom. 
Variation between row means involved four means, the variation of which 
was in relation to so variation between row means has three degrees 
of freedom. Since total variation has iV — l=s20— 1 = 19 degrees 
of freedom, residual variation has 19 — (4 + 3) = 12 degrees of freedom. 

From the estimated variances of Table 26.4, we may now make two F 
tests, one for column means: 


kf /Nr ' 

S( 

1 \ 1 
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and the other for row means: 


0.012258 

0.030008 


== 0.408 j Thx — ^2 — 12. 


Since neither of these F values exceeds 1.0, it is clear that neither the 
estimated variance between column means (that is, between pencils) nor 
the estimated variance between row means (that is, between locations) 
exceeds our estimate of chance variance. Therefore, no significance test 
is needed. “ If the reader is interested in knowing whether either F value 
is significantly less than 1.0, he may proceed as indicated earlier: compute 

- and look up this value in Appendix M with the degrees of freedom 

F 

reversed. He will find that neither of the F values is significantly less 
than LO. 

The denominator for both of the F values computed above was esti- 
mated residual variance; that was our measure of chance variance, since 
it was the only one of the four sources of variation which would not be 
affected by heterogeneity. iThe fact that there was but one entry in a 
box in Table 26.3 makes it impossible to evaluate two elements which are 
present and separable when there is more than one entry in a box. These 
are: (1) interaction between the two criteria of classification and (2) 
variation within boxes. 

Two criteria of classification, n|ore than one entry in. a box. 

Part I of Table 26.5 shows data of life in minutes of nine brands of flash- 
light cells when in new condition and after 6-12 months’ storage. Here 
there are two criteria of classification, as before, but there are five entries 
in each box. Total variation is now made up of four components: 
variation between column means, variation between row means, inter- 
action between column and row means, and variation within boxes. 
Using the sums shown in Table 26.5, we shall proceed to obtain the 
numerical values of all of these. 

Total variation. The expression for total variation is the same as 
previously used. 


- 


(zxy 

N 


= 34,325,736 - 


2,874,460,996 

90 


= 34,325,736 31,938,455.51, 

- 2,387,280.49. 


If we ignore the locations on the pencils where the tests were made, the data of 
Table 26.3 form a problem with one criterion of classification. On this basis, also, 
variance between column means (that is, between pencils) is not significant, the 
first edition of this text, pp. 356“359. 
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Variation between column means employs the same formula as* in the 
preceding illustration, since the number of items in the two columns of 
Part I of Table 26.5 is the same. 



(2X)2 ^ 1,454,015,716 _ 2,874, 460, 996 _^ 
N 45 90 ’ 

= 32,311,460.36 - 31,938,455.51, 
== 373,004.85. 


Variation between row means also uses the same expression as in the 
preceding example, since the number of items in the nine rows of Part I 
of Table 26.5 is the same. 


kr 

s 

1 



( SX) 2 333,359,050 2,874,460,996 

AT 10 90 ' 

- 33,335,905 - 31,938,455.51, 
= 1,397,449.49. 


Variation within boxes. This is the variation of the items in the boxes 
around the means of the boxes. Symbolically it is 

kb r Nb^ 1 

where 

is the mean of a box, 

Nb is the number of items in a box, 
kb is the number of boxes, 
m 

2 is a sum over the Nb items in a box, and 
1 

kb 

2 is a sum over the h boxes. 

1 


By a process similar to that shown in Appendix S, section 26.2, this 
expression becomes 



However, there is the same number of items in each of the boxes of Table 
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TABLE 26.5 

Computaiiofi of P^alues Required for Analysis of Variance of Bata of Life 
of Type B Flashlight Cells’^ 

1. Observed data and sums for II. Squares and sums for 

columns and rows columns and rows 


Brand 

New 

After 1 
stor- 
age 

Nr 

ZX Brand 

1 

New 

After 

storage 

Nr 

1 

A 

696 

728 

760 

683 

720 

612 

513 

558 

479 

495 

6,214 A 

484,416 

529,984 

532,900 

466,489 

518,400 

374,544 

263,169 

311,364 

229,441 

245,025 

3,955,732 

B 

661 

646 

693 

674 

678 

643 

642 

636 

678 

646 

6,597 B 

436,9^ 

417,316 

480,249 

454,276 

459,684 

413,440 

412,164 

404,496 

469,684 

417,316 

4,365,555 

C 

749 

757 

832 

787 

760 

722 

670 

649 

718 

448 

7,092 C 

561,001 

673,049 

692,224 

619,369 

577,600 

521,284 

448,900 

421,201 

515,624 

200,704 

5,130,856 • 

D 

840| 

7341 

8451 

7981 

885| 

706 

657 

7281 

576i 

746 

7,615 D 

705,600 

538,756 

714,025 

636,804 

783,225 

498,436 

431,649 

529,984 

331,776 

556,516 

5,726,771 

E 

690| 

7331 

736 

691 

659! 

628 

648 

602 

622 

640 

6,649 E 

476,100 
537,289 
• 541,696 
477,481 
434,281 

394,384 

419,904 

362,404 

386,884 

409,600 

4,440,023 

F 

733 

757 

714 

608 

693 

6721 

604i 

622 

576 

658 

6,637 F 

537,289 

573,049 

509,796 

369,664 

480,249 

451,584 

364,816 

386,884 

331,776 

432,964 

4,438,071 

<? 1 

478 

734 

635 

672 

410 

296 

455 

320 

272 

480 

4,762 G 

228,484 

538,7561 

403,2261 

451,584! 

168,100 

87,616 

207,025 

102.400 
73,984 

230.400 

2,491,574 

E 

470 

586 

395 

414 

438 

413 

543 

138 

38 

234 

3,669 H 

220,900 

343.396 
156,025 

171.396 
191,844 

170,669 

294,849 

19,044 

1,444 

54,756 

1,624,223 

I 

680 

507 

362 

458 

555^ 

352 

408 

544^ 

227^ 

396 

4,489 1 

462,400! 

257,049 

131,0441 

209,7641 

308,0251 

123,904 

166,464 

295,936 

51,629 

166,816 

2,162,931 

'Mm 

ZX 

l 

29,704 

23,910 

\ 

53,614 « ZX SX* 

1 

20,361,174! 

13,964,562 

34,325.736 = 
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III. Sums and squares of sums for boxes 


Box 

i 

Nb 

sx 

1 

{?')■ 

Row 1, Col. 1 

3,557 

12,652,249 

Col. 2 

2,657 

7,059,649 

Row 2, Col. 1 

3,352 

11,235,904 

Col 2 

3,245 

10,530,025 

Row 3, CoL 1 

3,885 

15,093,225 

Col. 2 

3,207 

10,284,849 

Row 4, Col. 1 

4,102 

16,826,404 

CoL 2 

3,413 

11,648,5^9 

Row 6, CoL 1 

3,509 

12,313,081 

CoL 2 

3,140 

9,859,600 

Row 6, CoL 1 

3,505 

12,285,025 

CoL 2 

3,132 

9,809,424 

Row 7, CoL 1 

2,929 

8,579,041 

CoL 2 I 

1,823 

3,323,329 

Row 8, CoL 1 

2,303 

5,303,809 

CoL 2 

1,366 

1,865,956 

Row" 9, CoL 1 

2,562 

6,663,844 

CoL 2 

1,927 

3,713,329 

Total 

53,614 

168,947,312 = 


kb 

X 

1 



2 


* Life of a cell is the time in minutes for cell voltage to drop to 0,90 volts 
when tested as in Federal Specification W-B-lOlb. Type D cells are the 
largest flashlight size. ^ 

Data in part I furnished through the courtesy of Consumers’ Research, 
Washington, New Jersey, from tests of flashlight batteries reported in 
CR’s August 1953 Bulletin. 


(SX)2 » (53,614)2 = 2,874,460,996 

kc/N. Y 

S ( SXj « (29,704)2 + (23,910)2 « 1,454,015,716. 

kr/Nr Y 

S f SX j - (0,214)2 + (6,597)2 -f (7,092)= + (7,515)= 

4- (6,649)2 +<6,637)2 + (4,752)= + (3,G69)= 
+■ (4,489)2 == 333,359,050. 


26.5, Part I; so we can write 

h /Nk \ 2 

!(f) 


34,325,736 

34,325,736 

536,273.6. 


168,947,312 


33,789,462.4, 


SZ* - 
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Inieradion. The nunlerical value for total variation exceeds the sum 
of the J}hree variations last obtained. This difference is the variation due 
to interaction between column means and row means. Its numerical 
value is 

2,387,280.49 - (373,004.85 + 1,397,449.49 + 536,273.6) = 80,552.55. 

Alternatively, but much more laboriously, interaction may be computed 
directly from 

+ X--Xr- 

1 

Estimated variances. Table 26.6 shows the amount of variation, the 
degrees of freedom, and the estimated variance for each source of varia- 
tion; total variation and the degrees of freedom for total variation are also 

TABLE 26.6 


Summary of Computations for Analysis of Variance of Data of Life 
of Type D Flashlight Cells 


Source of variation 

Amount 

variation 

Degrees' 

of 

freedom 

Estimated 

variance 

Between column means 

373,004.85 

1 

373,004.85 

Between row means 

1,397,449.49 


174,681.19 

Interaction : 

80,552.65 

8 

10,069.07 

Within boxes 

536,273.6 

! 72 

7,448.24 

Total 

2,387,280.49 
» 

89 



shown. The number of degrees of freedom for variation within boxes is 
h{Nb ““ 1) = 72, since the deviation of each item in a box was taken from 
the mean of the box. Degrees of freedom for interaction are obtained by 
subtracting the degrees of freedom for the other three sources of variation 
from the degrees of freedom for total variation. Thus, the number of 
degrees of freedom for interaction is 

89 - (1 + 8 + 72) - 8. 

We are now ready to test the estimated variance between column 
means and the estimated variance between row means. However, we 
must first decide which of the other two variances is to be the denominator 
of the F test. It is true that the variation within boxes is the only one 
of the four sources of variation which 'would be unaffected by hetero- 
geneity among column, row, or box means. It would therefore appear 
that estimated variance within boxes should be our measure of chance. 
But there is another point to consider: if the difference between row (or 
column) means is not greater than the interaction between row and 
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column means, the difference can hardly be 'considered meaniiigful.^^ 
Consequently, the usual procedure is as follows: fir^t test the estimated 
variance of interaction against the estimated variance within boxes; if 
the estimated variance of interaction is significantly larger than the 
estimated variance within boxes, test each of the other two estimated 
variances against the estimated variance of interaction; if the estimated 
variance of interaction is smaller than, or is not significantly larger than, 
the estimated variance within boxes, pool the variation and the degrees 
of freedom from these two sources and compute a new estimated variance 
to be ii^ed as the denominator for the F test.^^ 

Testing first the estimated variance of interaction against estimated 
variance "within boxes, we have 


10,069.07 

7,448.24 


1.35. 


= 8; n2 = 72.) 


From Appendix M it is seen that this value of F is not significantly 
greater than 1.0, so estimated variance of interaction does not signifi- 
cantly exceed the estimated variance within boxes. 

Since interaction is not significant, we pool the variation of interaction 
and within boxes, and divide this value by the degrees of freedom for 
these two sources of variation, giving 

616,826.15 -f- 80 = 7,710.33. 

This is the denominator of F ft)r testing estimated variance between 
column means and estimated variance between row nieans. 

For column means, 


373,004.85 

7,710.33 


48.38. (m = l;n2 === 80.) 


This point is not so easy to grasp from the data of Table 26.5 as it is from an 
illustration given by Mood. His example, for which no data are given, deals with 
five men (columns) operating four machines (rows) and has three observations in each 
box. He notes that one man may do better on one machine tfian another man, but 
the first man may not do as much better or may even do worse on a second machine. 
To be meaningful, the differences between machines should exceed the interaction; 
otherwise, one might install what appeared to be the best machine but find that the 
man assigned to operate that machine is not as productive on it as he would have 
been on' another machine. See A. M. Mood, Introduction to the Theory of Statistics^ 
McGra'w-Hill Book Company, New York, 1950, pp. 334-337. 

Some authorities recommend using the larger of the two variances attributable 
to interaction or within boxes. If estimated variances of interaction is the larger, 
but not significantly so, this procedure allows for possible small effects of interaction 
not revealed when estimated variance of interaction was tested. It also tends to 
increase the number of Type II errors. 
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From Appendix M it is seen that this value of F is far beyond the 0.001 
point, so the difference between column means (between fresh and stored 
cells) is real. 

For row means, 


174,681.19 
7,710.33 ' 


22 . 66 . 


(m = 8; n2 = 80.) 


This F value, too, is beyond the 0.001 point, and the dillerence between 
row means (between brands of cells) is significant. 


Situations in which there are two criteria of classification with unequal 
numbers of items in the boxes, and those involving three or more criteria 
of classification, are beyond the scope of this book.^^ 


Interrelationships Between and F 

<r 

In Chapter 24 it was noted that the t distribution approaches the nor- 
mal distribution as n approaches infinity. The normal distribution is 
therefore a special case of the t distribution, as shown in the last row of 
Appendix 1. 

• In Chapter 25 it was pointed out that, for the same set of data, normal 
deviates yield the same probabilities as do values when n = 1 for x^« 
More specifically, we found, upon comparing Appendices H and J, that 

for a given probability when n = 1 for x^* 

In this chapter it was noted that, for any given probability, — == F, 

n 

when n for x^ equals m for F and %vhen ^2 = « for F. This may be seen 
by comparing Appendices J and M. 

In this chapter, also, it was pointed out that for any given probabilitjq 
B = F when nfort equals 112 for F and when 7ii for F is 1. This is appar- 
ent from an examination of Appendices I and M. 

What has been said in the preceding four paragraphs has been brought 
together in Chart 26.2. From this chart it is clear that F is an inclusive 
distribution in that the other three distributions are merely special cases 
ofF. 


MEASURES OF SKEWNESS AND KURTOSIS 

SkewDiesst In Chapter 10 the skewness of the distribution of the 
grades of 225 midshipmen, as measured by jSi, was found to be 0.18. 

^®See H. M. Walker and J. Lev, Btalhtkal Inferencej Henry Holt and Com pan v, 
New York, 1953, pp. 363-386. 
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Chart 26.2. Relationship Between the Normal, t, ^ Distributions* 

Each box within the double rules may be thought of as the end of a drawer which, 
when pulled out, reveals the F values and, in some instances, the squared normal 

(iV*), and values for the indicated probabilities. The entire diagram is F. The 

box at the extreme lower left is NK The left column is P, The bottona row is — * 

n 

This chart is an elaboration of one given in K. Mather, Statistical Analysis in Biohgy, 
p. 47, Interscience Publishers, New York, 1943. 


Using 0.05 as a criterion, is this value of Si significantly greater than 0? 
Egon S. Pearson has prepared tables of the 0.10 and 0.02 limits of Si 
when based on samples drawn from a normal population. This table is 
shown as Appendix O, and the small chart included with that appendix 
shows the shape of the distribution of Si- Appendix O does not show the 
values of Si for N = 225, but for either N = 200 or N = 250 the value 
Si — 0.18 is beyond the 0.02 point. Significant skewness is present. 

In Chapter 10 the value of Si for the distribution of ages at death of 371 
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American inventors was found to be 0.16. From Appendix 0 this value, 
alsoj is seen to be significantly greater than aerd. 

In Chapter 23 a normal curve was fitted to the distribution of baseball 
throws for distance by 303 first-year high school girls. /3i was found to 
be 0.0104. The value for does not differ significantly from 0, as may 
be seen from Appendix O. 

Knrtosis. Table 10.9 showed a leptokurtic distribution, the cost of 
building five-room wood houses, with ^2 = 4.46 and N = 82. With 0.05 
as our criterion, is this value of 4.46 significantly different from 3.0, the 
value of jSa for a normal distribution? Appendix P shows the upper and 
lower 0.01 and 0.05 limits of ^2 when based on random samples from a 
normal distribution. Since Appendix P shows no entries for values of 
N below 100, we cannot be sure whether or not ^2 = 4.46 is beyond the 
upper 0.01 point, but it is probably beyond 0.05. 

In Table 10.10 a distribution of the length of life of a group of electric 
lamps was found to have /?2 = 2.22, We cannot make a test to determine 
whether 2.22 is significantly less than 3.0, since the data of Table 10.10 
were in terms of percentage frequencies and we do not know the number 
of lamps involved. However, if we look at Appendix P, we may note 
•that i02 “ 2.18 is at the lower 0.01 limit and 02 == 2.35 is at the lower 0.05 
limit when the sample consists of but 100 items. For samples of 125 
items or more, 02 == 2.22 is beyond the 0.01 point. If the data of Table 
10.10 include 100 or* more lamps (and they should, or percentages should 
not have been shown), the distribution is significantly platykurtic. 

CORRELATION COEFFICIENTS 

Simple correlation. When a correlation analysis has been made for 
a sample, a number of questions may be raised. Among them are: Does 
the value of r differ significantly from zero? Does the value of r differ 
significantly from a specified value other than zero? Do two r values 
differ significantly from each other? What are the confidence limits 
of the correlation in the population? What single estimate of the cor- 
relation in the population may be made? We shall consider each of these 
in turn. 

Does the value of r differ significantly from zerof Here we test the 
hypothesis that there is no correlation in the population. That is, that 
r^ or f(p = 0. If the hypothesis is discredited, the correlation is con- 
sidered significant. The procedure involves the ^test with which the 
reader is already familiar. The value of t is obtained from 
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after which we ascertain P from Appendix I with h = N — 2. “(Two 
degrees of freedom are lost because of the two constant^ in the estimating 
equation. “) For the data of height growth and diameter growth of 
trees, N was 20 and r was +0.758. These give 


t = 0.758 


/ (20 - 2 ) 

> 1 - 0.574 


4.93. 


When n = 20 — 2 = 18, Appendix I shows that t = 4.93 has P < 0.001. 
Consequently, the value of r is significant. 

It is of interest that this test is the same as the test to ascertain whether 
b differs significantly from zero. The expression to use is‘® 


4 


'LxKN - 2) 


For the tree data, we found 5 = +1.677, = 42.6055, and ^y\ = 

88.74. Consequently, 


^ 88.74 


4.93, 


the same as obtained before. 

Does the value of r difer significantly from a specified value other than 
zero? When f(s> == 0, the distribution of values of r from random samples 
is symmetrical about 0, ranging from —1.0 to +1.0. When r(? 0, the 

distribution of values of r from random samples is not symmetrical around 
r(p, and the ^-test is inappropriate. To test whether r differs significantly 

A more complete statement is this: We know that ^ F when ui for F Is 1 and 
when n for t equals n% for F, The F test corresponding to the above t test is 


'Zyl ^ (N - 2) 


Explained variation has 2 — 1 = 1 degree of freedom, since it is based upon the 
deviations of the Yc values (Fc =» a + bX) from F. Unexplained variation has 
¥ — 2 degrees of freedom, since it is based upon the deviations of the i¥ values from 
Fc = a + hX, 

For proof of the equality, see Appendix S, section 26.3. A number of aiternatiYe 
formulas for testing r or 5 are available. Among these are; 



b2xy(N - 2) 


4 


(ZxyY{N - 2) 
Sx*Sv* “ (2x3/) 
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from*a value of r<p ^ 0, we transform r into'® 

2 = 1.15129 log 

1 — r 

the distribution of which is approximately normal around 
2 :(P = L15129 log 

1 — r(p 

with the standard error of z being^^ 

1 

Vn - 2.6667 

Suppose that we wish to know whether our r of +0.758 for the tree- 
growth data differs significantly from a hypothetical Vfp of +0.750. We 
compute 

2 = 1.15129 log i+m = 0.992; 

2 (P = 1.15129 log = 0.973; 

a, = -=2= =*0.240; and 
V 20 - 2.6667 

X _ z - Z(p _ 0.992 - 0.97 3 _ 0.019 _ 
cr " cr, ^ 0.240 ~~ 0.240 ““ ‘ ’ 

Appendix H tells us that we may expect a difference this large or larger 
owing to chance causes about 94 times in 100. The hypothesis that 
f rs +0.758 is the correlation of a random sample from a population 
having f(p = +0.750 is not impugned. The difference is not significant. 

Do two values of r differ significantly from each other f If we w^ere inter- 
ested in testing the significance of the difference between the value of 
r +0.758 (zi 0.992) for our sample and that of another sample r of 

See E. A. Fisher, Statistical Methods for Research Workersj Hafner Publishing Co., 
New York, 1950, 11th ed., pp. 197-204. 

The usual expression is For explanation of that given here, see 

VN — 3 

liight on the Correlation Coefficient and its Transforms, by Harold Hotelling, 
Journal of the Royal Btaiisttcal Society, Series B, VoL XV, No. 2, 1953, p. 220. On 
nages 228-224, Hotelling suggests two modifications of z which may be more nearly 
normal than the form given above. 
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+0.750 {zi >= 0.973), obtained from 20 pairs of items, we would coippute 




^Zi—Zt 


= —;===== = 0.240; 

V20 - 2.6667 

= = 0.240;* 

V20 - 2.6667 

= Vcrl + (xl = V(0.240)2 + (0.240)^ 
= 0.339; and 


X _ - g2 _ 0.992 - 0.973 ^ 0.019 

or ~ 0.339 ~ 0.339 


The table of normal areas (Appendix H) gives P = 0.95, and we conclude 
that the difference is not significant. 

Confidence limits of r(j>. As in the case of A(p, tt, andcr, we may wish to 
know the confidence limits of r<p. These are obtained by use of the 
expression 

z ± - (Tzz 


This will give us two values for which are then converted 
If we wish the 95 per cent confidence limits ^ = 1.960^ 
growth data, where r was +0.758 tnd z = 0.992, have 


to r<p values, 
for the tree- 


0.992 = % ± (1.960)(0.240). 
= 0.992 ± 0.4704. 

Z(s>^ = 0.5216 and 
- 1.4624. 


Converting to and to gives 

r<pt = +0.479 and 
r(pa s= +0.898, 


which are the 95 per cent confidence limits. 

Single estiniate of icp. When discussing variances, we noted that a 
single estimate of cr^ might be made from a sample by means of 


= 


N- r 


In somewhat similar fashion, an estimate may be made of We shall 
refer to it as P. We use rather than the more logical to indicate an 
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estimate of the coefficient of detennination in the population, in order to 
avoid complicated subscripts in later sections of this chapter. 

We already -know, from footnote 8 in Chapter 19, that 


M±l, 

Sy® Sy® N 


Now, Sy.x is a biased estimate of and Sy is a biased estimate of (Ty. 
Unbiased estimates are obtained by dividing the measures of variation 
by the appropriate number of degrees of freedom, rather than by N, 
Thus, 


N - 1 


33- 




; and 


AT - 2 

^rx 

1 - = 1 


Sff® ^ (W - 2) 
Sy® 4- (AT - 1)’ 


Since 


Sy.^ AT - 1 

Sy® ' iV - 2’ 


Sy® 


t 


we may write 


f® = 1 - (1 


— r®) 


AT - 1 
AT - 2* 


For the tree-growth data, where r® = 0.574 and r = -f 0.758: 

F = 1 - (1 - 0.574) 

^ 20-2 

= 0.550. 

r = -1-0.742. 

When r® is very low, may be negative. In such a case, the correlation 
in the population should be considered to be zero. 

Non-linear correlation. WTien dealing with a second-degree curve, 
a third-degree curve, or a curve of higher order, we may wish to know: 

(1) whether the non-linear coefficient of determination is significantly 
larger than a coefficient based upon a curve of lower order, or (2) whether 
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the noiiJinear coefficient is significantly greater than «erQ. We msy also 
occasionally wish to make an estimate of the correlation in the population. 

Second-degree curve. For the data of diameter and volume of ponderosa 
pine trees, we found, in Chapter 20, that 


Variation explained by straight line 


M 


Total variation 
152,259.2 


159,698 


= 0.953, 


and 


„2 


Variation explained by second-degree curve 

— , 

Total variation 




cF.XX* 


Hy 


2 


156,235.5 

159,698 


0.978. 


The simplest method of ascertaining whether significantly larger 

than is to compute the measure mentioned in footnote 2 of 

Chapter 20, and make a ^-test of r^xKx with n = V — 3. (Explanation" 
of the use of W — 3 is given on the next page.) This coefficient of partial 
determination, which tells us the proportion that (1) the added 

variation explained by the use of constitutes of (2) the variation 
unexplained by the straight line, is 




1 - 

0.978 - 0.953 
1 - 0.953 


0.532. 


The t test is exactly the same as the i test for r, except that we use 
N — S instead of iV — 2. 


^r|x..x(JV - 3) 




4 


0.532(20 - 3) 


0.468 


= 4.4. 


When « = 17, a value of i — 4.4 is beyond the 0.001 level (see Appen- 
dix I), so w6 conclude that the use of X® has explained a significantly 
larger amount of variation. 
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The‘foregoing4s a simpler equivalent of the usual F test^® in which 


/Variation explained by\ /Variation explainedX 
_\ second-degree curve / \ by straight line 

Degrees of 
freedom 

Y Total \ /Variation explained by\ 
yvariation/ \ second-degree curve / 

-r Degrees of freedom 


('^VcY.XX^ ^Ve) * I 

”” (2^^ ^ (i\r ~ 3)^ 

with Ai == 1 and Aa == A — 3. The nuinber of degrees of freedom in the 
numerator is 2 — 1 = 1, because it is the difference between the number 
of degrees of freedom for explained variation computed from the second- 
degree curve (which is two) and the number of degrees of freedom for 
explained variation computed from the straight line (which is one). 
Explained variation obtained from the second-degree curve has 3 — 1=2 
degrees of freedom because the equation has three constants and the vari- 
ation of the computed values was taken around Y; explained variation 
gotten from the straight line has 2—1 = 1 degree of freedom because 
4he equation has two constants and the variation of the computed values 
was taken around F. The number of degrees of freedom for Si/^V.xx® == 
2^2 — denominator, is A — 3 because the unexplained 

variation was obtained from the squared differences of the Y values (of 
which there are A) from a second-degree curve, which has three constants. 
Alternatively, we may note that total variation has A — 1 degrees of 
freedom and that explained variation has 3 — 1 degrees of freedom; 
therefore, their difference, which is unexplained variation, has (A — 1) — 
(3 — 1) = A — 3 degrees of freedom. 

If the numerator and denominator of the expression given above for 
F are each divided by we have the alternative form 

p (4.xx^ - r^) 1 

(1 - 4.xx0 - (A^ - 3)' 

with nx = 1 and ^2 — A — 3. 

To ascertain whether r^.^x* “ 0.978 is significantly greater than 0, we 
use the A4est, computing either 

p — ^r.xxg (3 — 1) 

(1 — Ty,XX^) (A" — 3) 


The equivalence of the t test and the F test for this and other coefficients of partial 
determination is shown in Appendix S, section 26.4. 

, If both numerator and denominator of the second expression are divided by '2y\ 
the first expression is obtained, ' 
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or 

n ^ ^y^cY.XX^ (3 1) 

^ (St/^ ~ - (A -- 3)' 

with /li = 3 — 1 and ut N — 3. We use (3 — 1) degrees of freedom 
in the numerator because the second-deg!*ee curve has three constants 
and explained variation computed from that curve was taken around F; 
more generally, the degrees of freedom for explained variation are 
(m — 1), where m is the number of constants in the estimating equation. 
The number of degrees of freedom in the denominator was explained in 
the preceding paragraph; in general, the number of degrees of freedom for 
unexplained variation is (A — m). 

Using the first expression for the data of ponderosa pine trees, we get 

^ Q.978 ^ (3 ~ 1) 

(1 0,978) ^ (20 - 3)' 

= 379.1 (only two digits are significant), 


with ni = 2 and n 2 = 17. Referring to the F table of Appendix M, it is 
clear that this F value significantly exceeds 1.0, since it has a probability 
of much less than 0.001, and that, therefore, significantly exceeds 

zero. 

The procedure for making an estimate of the correlation in the popula- 
tion is similar to that previously given for linear correlation. That is 


^y.xx* 


1 




Br,xx^ 


(A -3) 




= 1 


(1 - 4.xx=) 


(N-- 1) 

iV - 1 


= 1 - (1 

= 0.975. 


jV ■ 

0.978)lf, 


Third-degree curve. To ascertain whether the use of in a curve of 
the type 

Fc - a + fcZ -b cX^ + dX^ 

explains a significant additional amount of variation, compute 


^rz^.xx* 

and then make a t test using 


Jl 

^r.xx^x» ry^xx^ 
'r.xx* 


I ^rx*.xx* 
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'^ylr.xx^ 


F 


1 
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(^vIy.xx^x^ 

(S|/2 ^ Xyly,jcx^jcd (i^ - 4)' 

^y.xxO ^ 

“ (1 - 4:) 

with 1 and = TV' — 4. 

To test the hypothesis that the population correlation is zero, compute 


F « 
F = 


^r.xx*x« (4 — 1) 

— . — — 

(1 — T% XX*X^ "I) 

^ylr.xx^x* (4 — 1) ^ 
^vIy.xx^x^ (A 4) 


with ni = 4 — 1 and nt — N — 4, Remember that '^ylr.xx^x* = 
'Sy^ — St/c5r.xxax*« 


The estimate of the correlation in the population is 

A2 _ 1 _ ^VeY.XX^X* (A — 4) 
2y^^{N^l) 

1 - (1 - ry.xx*x») _ 4 


^l.xx*x« = 1 


The reader can readily adapt these expressions for curves of a higher 
order. That, however, should rarely be necessary, since third-degree 
curves are not often used and curves of higher order are even more infre- 
quently employed. 

The correlation ratio. For the data of yield per acre of broom corn and 
man hours per ton, we found in Chapter 20 that 

2 _ Variation explained by column means 

9r.x Total variation of the series 


148.115 

217.515 


== 0.681. 


if a second-degree curve is fitted to the same data, we get^® 


^^r.xx* 


^ylr.xx* 


140.743 

217.515 


« 0.647. 


For the eorrektion analysis of these data using a second-degree curve, see the 
first edition of this ^ text, pp. 721-727. 
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To ascertain whether riy^x is significantly larger than we eompnte 


j? _ (^r.x "" ^Y.xx^) Degrees of freedom 
(1 — Vt.x) Degrees of freedom 

_ (0.681 - 0.647) ^ (11 - ^ 0.00378 

(1 - 0.681) 4- (103 - 12) 0.00351 

with rii — 9 and ^2 = 91. Or, we may use 


/Variation explained 
\ by column means 


^Variation explained by\ 

^ second-degree curve / 

Degrees of 
freedom 


/Total variation^ 
\ of the Y series j 

/Variation explainedX 
\ by column means / 

4 - 

Degrees of 
freedom 


_ (148.115 - 140.743) 4- (11 - 2) 

(217.515 ~ 148.115) 4- (103 - 12)' 

_ 0.8191 ^ 

0.7*626 ' 

with m ~ 9 and n 2 = 91. The degrees of freedom in the numerator 
represent the difference between the degrees of freedom for explained 
variation using the column meam^ (which is 11) and the degrees of freedom 
for explained variation using the second-degree curve (which is 2). The 
number of degrees of freedom for explained variation using the column 
means is 12 — 1 = 11 because there were 12 column means and the 
variation of those means was computed in relation to F. The number 
of degrees of freedom for explained variation using the second-degree 
curve is 3 — 1 = 2 because the equation has three constants and the 
variation of the computed values was taken around F. The degrees of 
freedom in the denominator, for the variation unexplained by the column 
means, are N minus the number of column means, that is, 103 — 12 = 91, 
Referring to Appendix M to ascertain the probability of F — LI when 
m == 9 and =* 91, we find that neither ni — 9 nor = 91 is shown in 
the table. However, it is not necessary to interpolate. By dooMng at 
the F values when ni - 8 and 12 and ^2 « 60 and 120, it is clear that the 
probability is greater than 0.10 and that is not significantly larger 
than f r.xx»» 

To determine whether significantly greater than isero, we use 

expressions for F similar to those previously employed for the same pur- 
pose for non-linear coefficients. They are 
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^2 


(Degrees^ of freedom - Number of column means 1) 
Degrees of freedom == 

Number of column meansy 


(I T Vy,x) • _ 

0.681 4- (12 - 1) 


s) 


(1 --- 0.681) 4- (103 
0.0619 


12 ) 


0.00351 


= 17.6, or, 


/Variation explainedX /Degrees of freedom = Number^ 
\ by column means / ' \ 


of column means 1 


( Total \ /Variation 
variation of } -“ { explained by 
the Y series/ \column mesons/ 
148.115 4 - (12 - 1) 


( Degrees of freedom 
N — Number of 
column means 


■) 


(217.515 - 148.115) 
13.46 


(103 - 12) 


0.763 


= 17.6. 


For this value of F, ni = 11 and ns — 91. Neither of these is tabled in 
Appendix M ; but, looking up ni = 8 or 12 and ns = 60 or 120, it is clear 
that F = 17.7 is far beyond the upper 0.001 point. r]Y,x is significantly 
greater than zero. 

The value of an estimate for tl^e population, is 


.ft* _ 1 

Vy.x 1 


rTotal varia- 
tion of the 
KY series 


( Variation ^ 

explained by 
column meansy 


( N — Numbe^^ 
of column 
means J 


(Total variation of the F series) 4- (N — 1) 


or 


fra - 1 


(1 - Vr.x) 


N - 1 


N — Number of column means 


« 1 - (1 0.681)W- = 0.642. 


Multiple correlation* When dealing with multiple correlation 
coefScients, we are primarily interested in knowing whether a given 
(or R) value is significant. We shall not use the example of Chapter 21 
as an illustration, because the data used there were not a sample. Instead 
we shall consider a four-variable problem dealing with the physical 
measurements of 27 white boys who were 12, 13, or 14 weeks oid.^^ The 


These and other data for boys and girls of various ages were supplied by the 
New York Foundling Hospital, courtesy of Hr. Alfred J. Vignec. Miss Marion C. 
Oentile kindly transcribed the figures. 
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variables were: 

X 1, weight in kilograms, 

X2, height in centimeters, 

X3, head circumference in centimeters, and 
Xi, chest circumference in centimeters. 

We shall test and E1.234, and, to do that, we need the following 
values: 

N = 27. 

2x1 = 11.6258. 

Sa;“i.23 = 9.1085; 

Sx.Vsg = 2.5173; 

RU, = 0.783. 

= 10.0152; 

= 1.6106; 

1^1.234 = 0.861. 

To ascertain whether a multiple coefficient of determination signifi- 
cantly exceeds zero, we employ an F test, similar to those used for the 
same purpose for non-linear coefficients. In general form, we may use 
either^ ^ 

p — ~ 

■ (1 - R^2u...J m)’ 

or, 

P = -f- (ct - 1) 

2 ^.\. 234 ...« m)’ 

with ni = m — 1 and iV '2 = iV ~ m. 

Using the first expression to test gives 

with ™ 2 and — 24. From Appendix M, the value obtained for 
F is seen to be far beyond the upper 0.001 point, and is clearly 
significant. 

®®Tlxe equivalence of the two expressions is fairly obvious: in the denominator of 
the second expression^ write SajJ — ^ in place of then divide 

the numerator and the denominator by the result is the first expression. 


0.783 -f- (3 - 1) 
(1 - 0.783) (27 - 

0*392 

s=: 43 4 

0.00904 
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Again, using the fipt of the two expressions, but this time to test 
^1.234; we obtain 


F 


0.861 (4 - !)• 

(1 - 0.861) -f- (27 - 
'0.287 

— = 47.5, 

0.00604 


4)’ 


with ni = 3 and rej =f 23. Rtm is also significant. 

Occasionally one may wish the value of the estimated coeffi- 

cient of multiple determination in the population. This is 

_ 1 ^^«1.234---m (A — m) 

ii:i.234 .-.» - 1 2x1 

_ 1 _ A — 1 

' N -m 

-I (I 


Computing only for the data of the 27 white boys, we obtain 

■Ki.234 = 1 — (1 — R\.izd ’ 

» N — m 

27-1 

= 1 - (1 - 0.861) 

= 0.843. 


Partial correlation. Since a coefficient of partial determination tells 
us the proportion that (1) the additional explained variation attributable 
to a given independent variable is of (2) the unexplained variation before 
the use of that independent variable, we are often interested in knowing 
whether the coefficient differs significantly from zero. The test involves 
computing 


withn == A" — m. 



- m) 

1 

A ' lm.23* • • (m— 1) 


} 


For the data of the physical measurements of the 27 white boys, 


^14.23 




1.284 


Rl 


23 




'<J1.284 


Zx‘ 


'ol,2S 


Rl 


23 






or 
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Using the first expression gives 


_ 0,861 - 0>83 
1 ~ 0.783 


0.359, 


Variable Xi explained 36 per cent of the variation which ^2 and Xg had 
failed to explain. 

For the value of we get 

0,359(27 - 4) 

1 - 0.359 
= 3.59, 

with n = 23. From the t table of Appendix I, it is seen that 0.001 < 
P < 0.01, and we consider ri 4.23 to be significant. 

In similar fashion, it may be ascertained whether ri 3.24 a^iid are 
significant. Without making the tests here, we shall merely note that 
^ 12.34 is significant at the 0.01 level and that r %^24 is not significant, even 
at the 0.05 level, since P for ri 3.24 is between 0.30 and 0.40, This does not 
tell us that we should necessarily exclude Xs from our analysis, since X? 
may contribute some useful information even though we have not beei^ 
able to demonstrate its significance. However, if we desired to use but 
two independent variables, they should, of course, be X2 and X4. 

As noted on page 728; the i test is an alternative to the F test for testing 
the significance of a partial coefficient of determination. The F test, in 
general terms, is 

n _ (Sa?a,234«-»w ^^cl.234 -r [m — (m — 1)] 

where m — (m — 1) is, of course, always 1, Thar this expression for F 
and the square of that given above for i are the same is demonstrated in 
Appendix S, section 26.4. 

In rare instances one may wish to know whether a coefficient of partial 
determination differs significantly from a population value which is not 
25 ero. Such a test may be made in exactly the same fashion as for the 
simple linear correlation coefficient (see pages 722'~723), with the standard 
error of z being 

1 1 

where m is the number of variables involved, which is the same as the 
number of constants in the multiple estimating equation, since we are 
considering only linear multiple correlation. 


'"iV - 2.6667 - (m- 2) ** ViV ~ m - 0.666?’ 
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If one wishes the value of f i„. 2 s . . . the estimate for the population, 

it may be obtained from 


■a2 

^l»a.23 • ■ ■ (m-1) 


= 1 - 


(A - m) 

4- [AT - (m - 1)]' 


or, if we divide the numerator and denominator each by Sxi, from 

1 - m. 


^ lm.23‘** Cm-1) 


= 1 -- 


1,234* 


1 




R 


1.234* • - m 


1.234* ••(m-1) 

2 

1.234 ‘**(m-l) 


- 


1 - R 


2 
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APPENDIX A 

Flexible Calendar of Working Days 

Calendar Days, Sundays, and Holidays, by Months, 1898-1976 

There are 14 distinct calendar patterns, referred to in the calendar by code nu caber. 
In the code table below the years are arranged consecutively within columns. , Aay 
year can be located by reading down the proper column. Then read across to 
tain the code number. For instance, 1945 is located by reading down in the Mm 
column, and the code number is seen to be IV. Row IV of the calendar gives infor- 
mation concerning 1945, as well as concerning 1900, 1906, 1917, 1923, 1934, 1961, 1962, 
and 1973, 


Code Table 


Year 

Code 

Number 

1898 

1910/(5 

1921/e 

1927 

1938 

1949 

1955 

1966 

I 

• . . 


. . . 

1928* 

... 

... 

1956*/ 

. . . 

II 

1899/ 

1911 

1922 

■a 

1939 

1960 


1967/e 

III 

1900t 


1923/ 

sss 

. . . 

1951/e 


... 1 

IV 

. . . 

1912* 


SM 

1940 */e 

mm 

SB 

1968* 

V 

1901 



1929/e 

. , . 


1957 

. • * 

VI 

1902/c 

1913/e 


1930 

1941 

BB 

1958 

1969 

VII 

. . . 

... 

1924* 


. . . 

1952* 


. . . 

VIII 

1903 

1914 

1925 

1931 : 

1942 

1953 

1959/e 

1970/e 

IX 


1915 

1926 

... 1 

1943 

1954 


1971 

X 

1904* 



1932 */e 



i960* 


XI 

. . . 

1916* 


. . . 

1944* 



1972*/ 

XII 

1905 

. . . 


1933 


. , . 

1961/ 


III 

1906 

1917 


1934/ 

1945/ 


1962 

! 1973 

IV 

1907/c 

191^ 


1935 

' 1946 


1963 

1974 

mam 

1908* 

. . . 


1936* 


. , . 

1964 */e 

... 

1 xni 


1919 



1947 

. . , 

. . . 

1975/e 

VII 

1909 

* . , 


1937/e 



1965 

. . . 

X 

... 

1920* 


... 

1948*/e 

... 

... 

Imi*^ 

XIV 


•Leap Year; February has 29 days, 

1 1900 was not a Leap Year, 

/ Good Friday occurred in Harch. 

« Easter occurred in March, 

From F, E. Croxton and D. J. Cowden, Praciical Bminem Siaiiitimf Second Edition, Frentice-Hall* 
New York, 1948, pp. S20-52L 
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Calendar 


The first row for each year gives the number of Sundays in parentheses ( } and 
Saturdays in brackets | | in each month. The second row shows the occurrence of 
holidays. Holidays occurring on Sundays are enclosed in parentheses; those on 
Saturdays are enclosed in brackets. For information concerning the states in which 
specific holidays are observed, see The World Almanac (published annually by the l^ew 
York World Telegram and The SuUj New York City?. 

Following is a key to the symbols used on the calendar: 


N New Yearns Day — January 1, 

L Lincoln's Birthday — February 12. 

W Washington’s Birthday— -February 

22 . 

F Good Friday. 

E Easter. 

M Memorial Day — ^May 30. 

J Independence Day — July 4. 


D Labor Day — First Monday in Sep- 
tember. 

C Columbus Day*-October 12. 

V Election Day — First Tuesday after 
First Monday in November. 

A Armistice Day — November 11 (be- 
ginning 1918). 

T Thanksgiving Day. 

X Christmas Day — December 25. 


Code 

Hmber 

.Jan 

31 

Feb 

2S 

Mbit 

31 

t 

May 

3i 

Jtm 

SO 

Jisl 

31 

Aug 

31 

Sep 

so 

Ocfe 

3! 

Nov 

30 

Bee 

31 

1 


Si W 

(4) 14} 


(5) J4J 

M 

m 14] 

B)jl6l 





fM 

H 

w 


(4) 

f M 

MOif 

M 


0)J4| 

J 

(4) {4J 

(5)^ 16] 
B 

(4)^141 

(4) 141 
¥(A)T 

(5)^181 

m 



(4i {4| 

f g 

(4,m141 

<4) {41 

(6)^161 

(4) |4j 

(4)^16, 



(syisi 

I? 

‘V 


(4) C5| 


(4), J4J 
M 

(4) 13] 

(S)J4I 

Jf 

(4) {41 

(5)^151 

(4)^141 

eu? 


? 

(4y4i 


B) B1 

1^1 

(4)^141 

ww 

(4)J61 

(4) IS] 

(6)^141 


M 

(6)^141 

¥I 

(4y4, 

If’ <§ 

C«) 15J 


'V« 

m 

(4)J4| 

m 

(4) 161 

(6)^14] 

V 

M 

(5)^141 

¥n 

m 


(s) m 

F (E) 

V 

m 141 

(4) 141 

J 

(5) 13} 

(4)^141 

‘V 


(4)^141 

?ra 

N 


m m 


(4)^I« 

(5) !4I 

m 

(6) 15] 

(4)jjl41 



(4)^141 

IX 

(4y5j 


m f4j 



(4) 14] 

V 

<5) 15] 

(4)^141 

(4)^ 15] 


(4)^141 

X 

(sysi 


(4) |4| 

Wm 


( 4 ) 141 

ii 

(5) 14] 

(4)j^t41 

(5)^161 


‘W 

XT 

(5) rs] 

N 


(4) i4J 

(® 

(SX J4J 
M 

(4) 14] 

(5> 16] 
J 

(4) 14] 

(4)„I41 

D 

(5)^ 16] 


V 

xn 

(5) fSJ 
N 

Hi <1 

(4) 141 

r (i 

(4) 141 

M 

(4) 14J 

(6) (51 

(4) 14] 

“>d“ 

(6)„(41 

c 


(5)^151 

in 


fS ^ 

(4) m 


(4), 141 

M 

( 4 ) 141 

(5)J61 

J 

|(4) 14] 

(4)^15J 

BM 

Hj 

(6;^15| 

IV 

(4)141 

N 


(4) m 

F (i| 

(4)^141 

(4) 161 

(5)J41 

J 

(4) 14] 

(S)^16J 

(4)^141 


(6)^151 

VI 

m\ 



r (ij 

(4)^141 

(6) 151 

<4)J4| 

J 

(4) 15] 

(5)^|4J 

ICI 


C^)^{4i' 

xin 

(4)14} 

N 

r#j 

m M 

Sf' M 

m 

(4) 14] 


(5) 15] 

(4)^141 

<4)^151 

H 

(43 14] 

Sr 

VII 

mm 

M 


<8) BI 

m w 

F (E) 

(4)^ I5J 
M 

(5) C4J 

mm 

J 

(5) 16} 




(43 C41 

s 

X 

(sysi 


(4) l4] 



(4) |41 


(6) 14] 

(4)^141 

(6)^151 

M 


XIV 

V 


( 4 ) l4l 


V 

(4) 141 

(4) J6J 

J 

(5) 14] 

M)j,t41 

'(5)^15] 

M 

V 









































































































































































APPENDIX B 


Sums of the First Six Powers of the 
First 50 Natural Numbers 

The following table, giving the sums of the first six powers of the first 
M natural numbers from Jf = 1 to If == 50 will be most frequently used 
in connection with the fitting of a trend line to time series. For that type of 
problem, ilf is the highest value of X used in the computation table. When 


M 



M 

sx* 

1 

ilf 

XX* 

i 

M 

sx« 

1 

M 

1 

1 

1 

1 

1 

1 

1 

1 

2 

3 

5 

9 

17 

33 

66 

3 

6 

14 

36 

98 

276 

794 

4 

10 

30 

XOO 

854 

1 300 

4 890 

6 : 

IS 

55 

225 

979 

4 425 

20 615 

6 

21 

91 

441 

2 276 

12 201 

67 171 

7 

28 

140 

784 

4 676 

29 008 

184 820 

S 

36 

204 ! 

1 296 

8 772 ! 

61 776 

446 964 

9 

45 

285 

2 025 

15 333 

120 825 

978 405 

10 

65 

385 

3 025 

25 333 

220 825 ! 

1 978 405 

11 

66 ^ 

506 

4 356 

39 974 

381 874 

3 749 966 

12 

78 i 

650 1 

6 084 

60 710 

630 708 

6 735 950 

13 

91 i 

819 ; 

8 281 

89 271 

1 002 001 

11 562 759 

14 

105 : 

1 015 ! 

11 025 

127 687 

1 539 826 

19 092 295 

16 

120 

1 240 1 

14 400 

178 312 

2 299 200 

30 482 9^ 

16 

136 

1 496 j 

18 496 

243 843 

3 347 776 

47 260 136 

17 

153 

1 785 

23 409 

327 369 

4 767 633 

71 397 705 

IS i 

171 

2 109 

29 241 1 

432 345 

6 657 201 

105 409 929 

19 

190 

2 470 

36 100 

562 666 

9 133 300 

152 455 810 

20 

210 

2 870 

44 100 1 

^722 666 

12 333 300 

216 455 810 

21 

231 

3 311 

S3 361 i 

917 147 

16 417 401 

302 221 931 

22 

253 

3 795 

64 009 

1 151 403 

21 671 033 

415 601 835 

23 

276 

4 324 

76 176 

1 431 244 ! 

28 007 376 

563 637 724 

24 

300 

4 900 

90 000 

1 763 020 1 

35 970 000 

754 740 700 

25 

325 

5 525 

105 625 

2 153 645 

45 735 625 

998 881 325 

26 

361 

6 201 

123 201 

2 610 621 

67 617 001 

1 307 797 101 

27 

378 

6 930 

142 884 

3 142 062 

71 965 908 

1 695 217 590 

2S 

406 

7 714 

164 836 

3 756 718 

89 176 276 

2 177 107 894 

29 

436 

8 555 

189 226 

4 463 999 

109 687 425 

2 771 931 216 

30 

466 

9 465 

216 225 

5 273 999 ' 

133 987 425 

3 500 931 216 

31 

496 

10 416 

246 016 

6 197 520 

162 616 576 

4 388 434 896 

32 

528 

11 440 

278 784 

7 246 096 

196 171 008 

5 462 176 720 

33 

561 

12 529 

314 721 

8 432 017 

236 306 401 

6 763 644 689 

34 

696 

13 685 

354 026 

9 768 353 

280 741 825 

8 298 449 105 

35 

630 

14 910 

1 396 900 

11 268 978 

333 263 700 

10 136 7X4 730 

36 

666 

16 206 

443 556 

12 948 694 

393 729 876 

12 313 497 066 

37 

703 

17 575 

494 209 

14 822 765 

463 073 833 

14 879 223 475 

38 

741 

19 019 

1 549 081 

16 907 891 

542 309 001 

17 890 159 859 

39 

; 780 

20 640 

1 608 400 

19 221 332 

632 533 200 

21 408 003 620 

40 

1 82c 

22 140 

1 672 400 

21 781 332 

734 933 200 

25 504 903 620 

41 

^ 861 

23 821 

741 321 

24 607 093 

850 789 401 

30 255 007 861 

42 

903 

26 585 

I 815 409 

27 718 789 

981 480 633 

36 744 039 605 

43 

946 

27 434 

894 9X6 

31 137 590 

1 128 489 076 

42 065 402 664 

44 

090 

29 370 

1 980 100 

34 886 686 

1 293 405 300 

49 321 716 510 

45 

I 1 035 

31 395 

i 1 071 225 

38 986 311 

1 477 933 425 

57 625 482 135 

46 

1 081 

33 511 

1 1 168 561 

43 463 767 

1 883 896 401 

67 099 779 031 

47 

1 128 

35 720 

1 272 384 

48 343 448 

1 913 241 408 

77 878 994 360 

48 

1 176 

38 024 

1 382 976 

53 651 864 

2 168 045 376 

90 109 584 824 

40 

1 225 

40 425 

1 500 625 

69 416 666 

2 450 620 625 

103 950 872 025 

60 

1 275 ^ 

42 925 

1 625 625 

65 666 666 

2 763 020 625 

: 119 576 872 026 
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the X origin has been taken at the center of the X values, it is necessary 
to multiply the sum m ations shown in this table by two. When the origin 
has been taken at the first X value in a time series, N as used in the normal 
equations is ikf + 1 ; when the origin has been taken at the center of the 
X values in a time series, N is 2ilf + 1. 

The sums of the first six powers of the first M natural numbers may be 
obtained from the following expressions: 



< 5 / 1 

' 2M^ + - l\ 1^, 

'v 3 / i 

fZM< + -ZM + i\ 

^ 7 )¥ 


A table of the sums of the first 7 powers of the first 100 natural numbers 
may be found in E. S. Pearson and H. 0. Hartley, Biometrika Tables for 
Statisticians, Volume I, Cambridge University Press, Cambridge, 1954, 
pp. 224-225, and in Karl Pearson, Tables for Statisticians and Biometri- 
cians, Part I, Cambridge University Press, Cambridge, 1948 (third 
edition), pp. 40-41, It appears also on the same pages in earlier editions. 



APPENDIX C 

Sums t)f the First Six Powers of the 
First 50 Odd Natural Numbers 

This table shows the sums of the first six powers of the first odd 
natural numbers from Mo — I to Mo — 50. Note that, when Mo = 2, 
we have the odd natural numbers 1 and 3 ; when Mo == 3, reference is to 
1, 3, and 5; when Mo - 4, the numbers 1, 3, 5, and 7 are involved ; and so 


(HisSkCMt 

odd 

aatural 

number} 


1 

Mo 

SXg 

Mo 

XXI 

i 

Mo 

XX$ 

1 

Mo 

xxs 

Mo 

sxg 

1 

1 

1 

1 

1 

1 

1 

1 

3 

2 

4 

10 

28 

82 

244 

730 

5 


9 

35 

153 

707 

3 369 

18 355 

7 


16 

84 

496 

3 108 

20 176 

134 004 

9 

5 

25 

165 

1 225 


79 226 

666 445 

11 

6 

36 

286 

2 556 

24 310 

240 276 

2 437 008 

13 

7 

49 

455 

4 763 

62 871 

611 569 

7 263 816 

15 

8 

64 

680 

S 128 

103 496 

1 370 944 

18 654 440 

17 

9 

81 

969 

13 041 

187 017 

2 790 801 

42 792 009 

19 

10 


1 330 

19 900 

317 338 

5 266 900 

89 837 890 

21 

11 

121 

1 771 

29 161 

611 819 

9 351 001 

175 604 011 

23 

12 

144 

2 300 

41 328 

791 660 

16 787 344 

323 639 900 

25 

13 

169 

2 925 

56 953 

1 182 285 

25 552 969 

667 780 626 

27 

14 

196 

3 664 

76 636 

1 713 726 

39 901 876 

956 201 014 

29 

15 

225 

4 495 

101 025 


60 413 026 

1 550 024 335 

31 

16 

256 

6 456 

130 816 

3 344 528 

89 042 176 

2 437 628 018 

33 

17 

289 

6 645 

166 763 

4 630 449 

128 177 569 

3 728 995 985 

36 

18 

324 

7 770 

209 628 


180 699 444 

6 567 261 610 

37 

19 

361 

9 139 

260 281 

7 905 235 

250 043 401 

8 132 988 019 

39 

20 

400 

10 660 

319 600 

218 676 

340 267 600 

11 651 731 780 

41 

21 

441 

12 341 

388 521 

13 044 437 

456 123 801 

16 401 836 021 

43 

22 

484 

14 190 

468 028 

16 463 238 

603 132 244 

22 723 199 070 

45 

23 

629 

16 216 

559 163 

20 563 863 

787 660 369 

31 026 964 695 

47 

24 

576 

18 424 

662 976 

25 443 544 

1 017 005 376 

41 806 180 024 

49 

25 

626 

20 825 

780 626 

31 208 345 

1 299 480 625 ; 

55 647 467 225 

51 

26 


23 426 

913 276 

37 973 546 

1 644 605 876 ! 

73 243 755 026 

S3 

27 


26 235 

1 062 153 

45 864 027 

2 062 701 369 ! 

95 408 116 156 

55 

28 


29 260 

1 228 528 

55 014 652 

2 565 985 744 

123 088 756 780 

57 

29 

841 

32 609 

1 413 721 

65 570 653 

3 167 677 801 

157 385 204 029 

59 

30 

900 

35 990 

1 619 100 

77 688 014 

3 882 602 100 

199 565 737 67C 

61 

31 

961 

39 711 

1 S46 081 

91 633 855 

4 727 198 401 

251 086 112 031 

63 

32 

1 024 

43 680 

2 096 128 

107 286 816 

5 719 634 944 

313 609 614 240 

65 

33 

1 089 

47 905 

2 370 753 

125 137 441 

6 879 925 569 

389 028 504 865 

67 

34 

1 156 

52 394 

2 671 516 

345 288 562 

8 230 050 676 

479 486 887 034 

69 

35 

1 225 

57 155 

3 000 025 

167 955 683 

9 794 082 025 

687 405 050 116 

71 

36 

1 296 

1 62 196 

3 357 936 

193 367 364 

U 698 311 376 

715 605 334 036 

73 

37 

1 369 

67 625 

3 746 953 


13 671 382 969 

866 839 560 325 

75 

i 38 

1 444 

73 150 

i 4 163 828 

253 406 230 

16 044 429 844 

1 044 818 075 950 

77 

39 

1 621 

79 079 

4 625 361 

288 559 271 

IS 751 214 001 

1 253 240 456 039 

79 

40 


86 320 

5 118 400 

327 309 352 

21 828 270 400 

1 496 327 911 560 

81 

^ 41 

1 681 

91 881 

1 5 649 841 

$70 556 073 

25 316 054 801 

1 778 757 448 041 

83 

42 

1 764 

98 770 

6 221 628 

418 014 394 

29 254 095 444 

2 105 697 821 410 

85 

43 

1 849 

105 995 

8 836 753 


33 691 148 569 

2 482 847 337 035 

S7 

44 

1 936 

113 664 

7 494 256 

527 504 780 

38 675 357 776 

2 916 473 538 044 

8^ 

45 

2 025 

1 121 485 

8 199 225 

690 247 021 

44 259 417 225 

3 413 454 829 005 

91 

48 

2 116 

129 766 

8 952 796 

658 821 982 

50 499 738 676 

3 981 324*081 046 

93 

47 

2 209 

I 138 415 

9 767 163 

733 627 183 

57 458 622 369 

4 628 314 204 495 

95 

48 

2 304 

147 440 

10 614 628 

815 077 808 

65 194 431 744 

5 363 406 165 120 

97 

49 

2 401 

156 849 

11 627 201 

903 607 089 

78 781 772 001 

6 196 378 160 049 

99 

50 

2 500 

■ 

166 660 

12 497 500 

999 666 690 

83 291 672 500 

7 137 858 '309 450 


m 
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on. For convenience^ the table shows both the highest odd natural 
number and M^. The sums shown here* will be used’alnlost exclusively 
in connection with the fitting of a trend line to a time seicies having an 
even number of years (or other periods) and where the origin is taken 
between the two center X values. Under these conditions: (1) the 
largest X value shown in the computation table is* the highest odd natural 
number and = (highest odd natural number + 1) 2; (2) the sums 

read from the table must be multiplied by 2; and (3) N as used in the 
normal equations is 2Mo* Xo means “odd value of X*^^ 

The sums of the first six powers of the first Mo odd natural numbers 
may be obtained from the following: 


Mo 

SX« « Ml 
1 

2X*^ « ^ 

1 * 3 

M* M, 

XXl - {2Ml - l)SXa 

i 1 


h 

Mo 

2X1 - 

i 

Mo 

sx; - 

1 


r: 


SXf 


^&Mt - 20MI + 7 ^ ^ 


SX, 

i 


- 72Ml + 31 


\ Mo 

) ^Xl 


A table of the sums of the first six powers of the first 100 odd natural 
numbers is given in “Formulae for Facilitating Computations in Time 
Series Analysis/^ by Frank A. Ross, Journal of The American Statistical 
Association, March, 1925, pp.75-79. 



APPENDIX D 

Ordinates of the Normal Curve 


Erected at Distances - from X, Expressed as Decimal Fractions of tlie 
$ 


Maximum Ordinate Yo 


The maximum ordinate is computed from the expression 


N't _ Ni 
° s ‘\/2‘vr 2.5066s 

-ac* 


The values tabled below result from solving the expression e 
The proportional height of an ordinate to be erected at any given value on the X axis 
can be read from the table by determining x (the deviation of the given value from the 

mean) and computing Thus, if ^ « $25.00, b *» $4.00, Fo — 1950, and it is 

B 

desired to ascertain the height of an ordinate to be erected at $23.00; x « $2.00 and 

X $ 2.00 

^ - ' ITaa 9,50. From the table the ordinate is found to be 0.88260 of the maxi* 

B 

mum ordinate Fo, or 0.88250 X 1960 »= 1721. 
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Ordinates of the Normal Curve 


a 

s 

0 

___ 

.02 

.03 

.04 

.05 

.06 « 

im 

.08 

.09 

0.0 

l.OOOOO 


.99980 

.99955 

.99920 

.99875 

.99820 

.99755 

.9^85 

.99596 

0.i 

.09501 

.99396 

.99283 

.99158 

.99025 

.98881 

.98728 

.98565 

.98393 

.98211 

0.2 

.08020 

.97819 


.97390 

.97161 

.96923 

.96676 

.96420 

.96156 

.95882 

0.3 

.95600 

.95309 

.95010 

mmaim 

.94387 


,93723 

.93382 


.9267? 

0.4 

.02312 

.91939 

.91558 

.91169 

.90774 

.90371 

.89961 

.89543 

.S9U9 

Mm 

0.3 

.88250 

.87805 

^7353 

.86896 

.86432 

.85962 

.85488 

lilBIl 

.84519 


0.6 

.83527 

.83023 

.82514 


.81481 

.80957 

.80429 

.79896 

.79359 

.7881? 

0,7 

.78270 

.77721 

.77167 


.76048 

.75484 

.74916 

.74342 

.73769 

.73103' 

0.3 

.72615 

.72033 

.71448 

.70861 

.70272 

.69681 


.68493 

.67896 

.67298 

0.0 

.66698 

.60097 

.65494 

.64891 

.64287 

.63683 


.62472 

.61865 

.61259 

1.0 

.60653 

.60047 

.59440 

.58834 

.58228 

.57623 

.57017 

.56414 

.55810 

.55209 

i.l 

.54607 

.54007 

.63409 

.52812 

.52214 

.61620 

.51027 

.50437 

.49848 

.49260 

1,2 

.48675 

.48092 

.475U 

mmm 

.46357 

.45783 

.45212 

.44644 

.44078 

.43516 

1.3 

.42956 

.42399 

.41845 

.41294 

.40747 

.40202 

.39661 

.39123 

.38589 

.38058 

1.4 

.37531 

.37007 

.36487 

.35971 

.35459 

.34950 

.34445 

,33944 

.33447 

.32954 

1.3 

.32465 

.31980 

.31600 

.31023 

.30550 

.30082 

.29618 

.29158 

J28702 

JJ8251 

1.6 

,27804 

.27361 

.26923 

.26489 

.26059 

.25634 

.25213 

.24797 

.24385 

.23978 

1.7 

.23575 

J23176 

.22782 

.22392 

.22008 

.21627 

.21251 

.20879 

.20511 

.20148 

1.8 

.19790 

.19436 

.19086 

.18741 

.18400 

.18064 

.17732 

.17404 

.17081 

.16762 

L0 

.16448 

.16137 

.15831 

.15530 

.16232 

.14939 

.14650 

.14364 

.14083 

.13806 

2.0 

.13534 

.13265 

.13000 

.1274f 

.12483 

! .12230 

.11981 

.11737 

.11496 

.11250 

2.1 

.11025 

.10795 

: .10570 

iriTcglrl 

.10129 

: .09914 

.09702 

.09495 

.00290 

.09090 

2.2 

.08892 

.08698 

.08507 

.08320 

.08136 

,07956 

.07778 

.07604 

.07433 

.07265 

2.3 

.07100 

.06939 

.06780 

.06624 

.06471 

.06321 

.06174 

mmm 

.05888 

.05750 

2.4 

.056X4 

.05481 

.05350 


.05096 

.04973 

.04852 

.04734 

.04618 

.04505 

2.3 

.04394 

.04285 

.04179 

.04074 

.03972 

' .03873 

.03775 

.03680 

.03586 

.03494 

2.6 

.03405 

.03317 

.03232 

.03148 

.03066 

.02986 

.02908 

.02831 

.02757 

.02684 

2.7 

.02612 



.02408 

; .02343 

1 .02280 

.02218 

.02157 

.02098 

.02040 

2.S 

.01984 

.01929 

.01876 


.01772 

1 .01723 

.01674 

.01627 

1 

.01536 

2.9 

.01492 

.01449 

.01408 

.01367 

.01328 

1 .01288 

,01252 

.01215 

liii 

.01145 


X 

$ 

0 

.1 

1 .2 1 

1 ^ 1 

A 

.5 

.6 

m 

1 l_l 

.9 

3, 

.01111 






.00153 

.00106 


.00050 

4. 

5. 

.00034 

.00000 





Hljllljlyl 

.00003 ! 

.00002 


.00001 


Largely from Rngg’^s Biatistical Methods. A'ppUed to Education^ by arrangement with the 
publishers, Houghton Mifflin Company. More detailed tables of normal-curve ordinates may 
be found in B. S. Pearson and H. 0. Hartley, BiometHha Tables for Statwiicians^ Volume I, 
Cambridge University Press, Cambridge, 19S4, pp. 104-110; in Karl Pearson, Tablm for Statm^ 
ticdans and Biametriciam, Part J, The University Press. Cambridge, England, 1948 (third edi- 
tion), pp, 2-8; and in Federal Works Agency, Work Projects Administration for the City of 
Hew York, Tables of ProbahiUtp Functions^ National Bureau of Standards, New York, 1042, 
VoL II, pp. 2-238, The values shown in these tables should be multiplied by » 2.5060 to 
agree with those shown above. 


74S 
































APPENDIX E 


Areas Under tHe Normal Curve 

X X 

From the Arithmetic Mean to Distances* - or - from the Arithmetic 

S cr 

Mean, Expressed as Decimal Fractions of tke Total Area 1.0000 

This table shows 
tlte black area: 



5or'2 

4 ff 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

0.0 

.0000 

.0040 

.0080 

.0120 

.0160 

.0199 

.0239 

.0279 

.0319 

.0359 

0.1 

.0398 

.0438 

,0478 

.0517 

,0557 

.0596 

.0636 

.0675 

.0714 

.0753 

0.2 

.0793 

.0832 

.0871 

.0910 

.0948 

.0987 

.1026 

.1064 

.1103 

.1141 

0.3 

.1179 

.1217 

,1255 

.1293 

.1331 

.1368 

.1406 

.1443 

.1480 

.1517 

0.4 

.1554 

.1591 

.1623 

.1664 

.1700 

.1736 

.1772 

.1808 

.1844 

.1879 

0.5 

.1916 

.1950 

.1985 

.2019 

.2054 

.2088 

.2123 

.2157 

.2190 

.2224 

0.6 

.2257 

.2291 

.2324 

.2357 

.2389 

.2422 

.2454 

.2486 

.2518 

.2549 

0.7 

.2580 

.2612 

.2642 

.2673 

.2704 

.2734 

.2764 

.2794 

.2823 

.2852 

0.8 

.2881 

.2910 

.2939 

.2967 

.2995 

.3023 

.3051 

.3078 

.3106 

.3133 

0.9 

.3159 

.3186 

.3212 

.3238 

.3264 

.3289 

.3315 

.3340 

.3365 

.3389 

1.0 

.3413 

.3438 

.3461 

.3485 

.3508 

,3531 

.3554 

,3577 

.3599 

.3621 

1,1 

.3643 

.3665 

.8686 

.3708 

.3729 

.3749 

.3770 

.3790 

.3810 

,3830 

1.2 

.3849 

.3869 

.3888 

.3907 

.3925 

.3944 

.3962 

.3980 

.3997 

.4015 

1.3 

.4032 

.4049 

.4066 

.4082 

.4099 

.4115 

.4131 

.4147 

.4162 

.4177 

1.4 

.4192 

.4207 

.4222 

.4236 

.4251 

.4265 

.4279 

.4292 

.4306 

.4319 

1.5 

.4332 

.4345 

.4367 

.4370 

.4382 

.4394 

.4406 

.4418 

.4429 

.4443 

1.6 

.4452 

.4463 

.4474 

.4484 

.4495 

.4505 

.4515 

.4525 

.4535 

,4545 

1.7 

,4554 

.4564 

.4573 

.4582 

.4591 

.4599 

.4608 

.4616 

.4625 

.4633 

1.8 

.4641 

.4649 

.4656 

.4664 

.4671 

.4678 

.4686 

.4693 

.4699 

.4706 

1.9 

.4713 

.4719 

.4726 

.4732 

.4738 

.4744 

.4750 

.4756 

.4761 

.4767 

2.0 

.4773 

.4778 

.4783 

.4788 

.4793 

# .4798 

.4803 

.4808 

.4812 

.4817 

2.1 

.4821 

.4826 

.4830 

.4834 

.4838 

.4842 

.4846 

.4850 

4854 

.4857 

2.2 

.4861 

.4864 

.4868 

,4871 

.4875 

.4878 

.4881 

.4884 

.4887 

.4890 

2.3 

.4893 

.4896 

.4898 

.4901 

.4904 

.4906 

.4969 i 

.4911 

.4913 

.4916 

2.4 

.4918 

.4920 

.4922 

.4925 

.4927 

.4929 

.4931 

.4932 

.4934 

.4936 

2.5 

.4938 

.4940 

.4941 

.4943 

.4945 

.4946 

.4948 I 

.4949 

.4951 

.4952 

2.6 

.4953 

.4956 

.4966 

.4957 

.4959 

.4960 

.4961 

.4962 

.4963 

.4964 

2.7 

.4965 

.4986 

.4967 

.4968 

.4969 

.4970 i 

.4971 

.4972 

.4973 

.4974 

2.8 

.4974 

.4975 

.4976 

.4977 i 

.4977 

.4978 

.4979 

.4979 

.4980 

.4981 

2.9 

.4981 

.4982 

.4982 

.4983 i 

.4984 

.4984 ; 

.4985 

.4985 

.4986 

.4986 

3.0 

.49865 

.4987 

.4987 

.4988 

.4988 ' 

.4989 

.4989 

.4989 

.4990 

4990 

3.1 

3.2 

as 

3.4 

3.5 

3.6 

3.7 

3.8 

3.9 

4.0 
4.5 

5.0 

,49903 

.4993129 

.4995166 

.4996831 

.4997674 

.4998409 

.4998922 

.4999277 

.4999519 

.4999683 

.4999966 

.4999997133 

.4991 

-4991 

.4991 

.4992 

.4992 

.4992 

.4992 

.4993 

.4993 


♦ The expression - is used when fitting a normal curve (pp. 590-607) ; is employed when 
« cr 

maHng a test of significance involving the standard deviation of the population and the normal 
curve (pp. 635-642, 663-666. 670-671, 673-675, 679-680, and 723-725). 

Largely from Engg's Btatisiiml Meth^fda Ap^ied to JSdmaiion (with corrections), by arrange- 
ment with the publishers, Houghton Mifflin Company. A more detailed table of normal-curve 
areas, but in two directions from the arithmetic mean, is given in Federal Works Agency, Work 
Froieots Adminisiration for the City of New York, TabUa of National 

Bureau of Standards, New York, 1942, YoL 11, pp. 2-338, 
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APPENDIX F 


Values off, 0) 


For Use in Fitting Curves of the Type 




Ni 




\ ^-Ei— 
J »‘s/2ir 


e 


2.* I 


\1 

2V^ 3.®/J 


X 

s 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

.0 

.00000 


KiSiitW 


mmm 



.00049 

.00064 

.00081 

.1 

.00099 

.00120 

.00143 

.00167 

.00194 

.00222 

.0025? 

.00285 

.00319 

.00355 

.2 

.00392 

.00432 

,00473 

.00516 

.00661 

.00607 

.00656 

.00705 

.00767 

.00810 

.3 

.00865 

.00921 

.00979 

.01038 

.01099 

.01161 

.01225 

.01290 

.01356 

.01424 

.4 

.01493 

.01564 

.01635 

.01708 

.01782 

.01857 

.01933 

.02011 

.02089 

.02168 

.6 

.02248 

.02329 

.02411 

.02494 

.02578 

.02662 

.02748 

.02833 

.02920 

.03007 

.6 

.03095 

.03183 

.03272 

.03361 

.03450 

.03540 

.03631 

.03721 

.03812 

.03904 

.7 

.03995 

.04086 

.04178 

.04370 

.04362 

.04463 

.04546 

.04637 

.04728 

.04820 

.8 

,04911 


.05093 

.05183 

.05274 

.05363 

.05453 

.05542 

.05031 

.05719 

.0 

.05806 

.06894 

,05980 

.06066 

.06162 

.06236 

msmsm 

.06404 

.06486 

,06568 

1.0 

.06649 

.06729 

.06809 

.06887 

.06965 

.07042 

.07118 

.07193 

.07267 


1.1 

.07412 

.07483 

.07652 

.07621 

.07689 

.07756 

.07822 

.07886 

.07950 

.08012 

1.2 

.08073 

.03133 

.08192 

.08250 

.08306 

.08361 

.08416 

.08468 

.08520 

.08671 

1.3 

.08620 

.08668 

.08715 

.08760 

.08805 

.08848 

.08890 

mmm 

.08970 


1,4 

.09045 

.09080 

.09116 

.09148 

.09180 

.09211 

.09241 

.09269 

.09296 


1.5 

.09347 

.09371 

.09394. 

.09415 

.09435 

.09454 

.09472 


.09605 

IRIB 

1.6 

,09533 

,09546 

.09557 

.09567 

.09577 

.09585 

.09692 

.09599 

.09604 

.09608 

1.7 

.09612 

.09014 

.09616 

,09016 

.09610 

.09615 

.09613 ■ 

.09610 

.09606 

.09602 

1.8 

.09597 

.09590 

.09584 

.09576 

.09568 

.09559 

.09549 

.09539 

.09627 

.09516 

1.9 

.09503 

.09490 

.09477 

.09463 

.09448 

.09433 

.09417 

.09401 

.09384 


2.0 

.09349 


.09312 

.09293 

.09273 

.09253 

.09233 

.09213 

,09192 


2.1 

.09149 

.09127 

.09106 

.09082 

.09060 

.09037 

.09014 

.08991 

.08967 

.08943 

2.2 

.08919 

.08895 

.08871 

.08847 

.08823 

.08798 

.08774 

.08749 

08724 

.08699 

2.3 

mmmm 

.08650 

.08625 

.08600 

.08575 

.08550 

,08525 


.08475 

.08450 

2.4 

,08426 

.08401 

,08376 

.08352 

,08327 

.08303 

.08279 

.08265 

.08231 

.08207 

2.5 

.08183 

.08169 

.08136 

.08112 , 

, .08089 

.08068 

.08043 

.08020 

.07098 

.07976 

2.6 

.07953 

.07931 

.07909 

.07888 

.07866 

.07845 

.07824 

mmm 

.07782 


2.7 

.07742 

.07722 

.07702 

.07082 

munmm 

.07644 

.07625 

.07606 

.07588 


2.8 

.07551 

.07534 

.07516 

.07499 

.07482 

.07465 

.07448 

.07432 

.07416 


2.9 

3.0 

3.1 

3.2 

3.3 

3.4 

3.5 

3.6 

3.7 

3.8 

3.9 

4.0 

.07384 

.07240 

.07118 

.07016 

.06933 

.0 GS 66 

.06813 

.06771 

.06739 

.06714 

.06696 

.06683 

.07369 

.07364 

.07339 

.07324 

.07300 

.07295 

.07281 

.07257 

.07254 


From W. A. Shewhart, Economic Control of Quality of Manufaciur&i Product, p, 91, 
B. Van Nostrand Company, Inc., New York, 193L Courtesy of I)» Van Noatrand 
Company, Inc., and The Bell Telephone Laboratories. 

For values of Fg beyond the range shown above, use the expression Fa 

1 

6 

of e may . be conveniently read from the table of ordinates of the normal ciarve, 
A^ppendix B, or from a more extensive table in E, S. Pearson and H. 0. Hartley, 
Biometriha Tables for Statisticians, Volume I, Cambridge University Press, Cambridge, 
1954, pp. 104-110, and in K^rl Pearson, Tables for Statistidam and Bionetnuam, 
Part Ip The University Press, Cambridge, England, 1948 (third edition), pp. 2"8, 

The values for s shown in the last two tables yield e when multiplied by 2.$06§, 
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APPENDIX G 


Areas in One Tail of the Normal 
Curve -at Selected Values* of f or ^ 

S v 

from the Arithmetic Mean 


Thisdable shows 
the black area: 



or 



or - 
S V 

wm 

.01 

.02 

,03 

.04 

.05 

.06 

.07 

.08 

.09 

0.0 


.4960 


.4880 

.4840 

.4801 

.4761 

.4721 

.4681 

.4641 

O.l 

.4602 

.4562 

.4522 

.4483 


KiSl 

.4364 

.4325 

.4286 

.4247 

0.2 


,4168 

.4129 

.4090 


mmm 

.3974 

.3936 

.3897 

.3859 

0.3 

.3821 

.3783 

.3746 

.3707 

,3669 

,3632 

.3594 

.3557 

.3520 

.3483 

0.4 

.3446 

.3409 

.3372 

.3336 


.3264 

.3228 


.3156 

.3121 

0.5 

.3085 

.3050 

.3015 

.2981 

.2946 

.2912 

.2877 

,2843 

.2810 

.2776 

0.6 

.2743 

.2709 

.2676 

,2643 

.2611 

.2678 

,2546 

.2514 

.2483 

.2451 

0.7 

.2420 

.2389 

.2368 

.2327 

.2296 

.2266 

.2236 

.2206 

.2177 

.2148 

0.8 

.2119 

.2090 

.2061 

,2033 


.1977 

.1949 

.1922 

.1894 

.1867 

0.9 

.1841 

.1814 

.1788 

.1762 

,1736 

.1711 

.1685 

.1660 

.1635 

.1611 

1.0 

.1587 

.1562 

.1539 

.1615 

.1493 

.1469 

.1446 

.1423 

.1401 

.1379 

1.1 

.1357 

.1336 

.1314 

.1292 

.1271 

.1251 

.1230 

.1210 

.1190 

.1170 

1.2 

.1151 

.1131 

,1112 

.1093 

.1075 

.1056 

.1038 

.1020 


.0985 

1.3 

.0968 

.0951 

.0934 

.0918 

.0901 

.0885 

.0869 

.0853 

.0838 

.0823 

1.4 

.0808 

.0793 

.0778 

.0764 

.0749 

.0735 

.0721 

.0708 

.0694 

.0681 

1.5 

.0668 

.0655 

.0643 


,0618 

^.0606 

.0594 

.0582 

.0571 

.0559 

1.6 

.0548 

.0537 

.0526 

.0516 

.0505 

.0495 


.0475 

.0465 

.0455 

1.7 

.0446 

.0436 

.0427 

.0418 

.0409 

.0401 


mm:im 

.0375 

.0367 

1.8 

.0359 

.0351 

,0344 

.0336 

.0329 

,0322 

.0314 

.0307 

.0301 

.0294 

1.9 

.0287 

.0281 

1 .0274 

.0268 

.0262 

.0256 

.0250 

.0244 


.0233 

2.0 

.0228 

.0222 

.0217 

.0212 


.0202 


.0192 

.0188 

.0183 

2,1 

.0179 

.0X74 

.0170 

.0166 

.0162 

.0158 

.0154 

.0160 

.0146 

,0143 

2.2 

.0139 

.0136 

.0132 

.0129 

.0125 

.0122 

.0119 

.0116 

.0113 

.0110 

2.3 

.0107 

.0104 

.0102 

■rctiiuit; 






.00842 

2.4 

.00820 

.00798 

.00776 

KW 

WMM 



|k| 


.00639 

2.5 

KRH 

.00604 

.00587 


IKI 

.00539 

hBH 



.00480 

2.6 

RvrVr 

HrnT' 


HTflr 

.00415 

.00402 




.00357 

2.7 



Ky ^I* 



.00298 




.00264 

2.8 

.00256 



,00233 


.00219 

.00212 



.00193 

2.9 

.00187 


Bl 

.00169 

Hi 


Hi 

Hi 

Hi 

.00139 


*<w i 

S f 

.0 

.1 

s 

.8 

.4 

.5 

.6 

D 

.8 

J 

8 

.00135 

.0*968 


.0*483 

.0*337 

.0*233 

.0*159 

,0*108 

,0*723 

.0*481 

4 

.0*317 

.0^07 

.0*133 

,0»854 

.0*541 

.0*340 

.0*211 

.0*130 

.0*793 

.0*479 

5 

.0«287 

.0*170 

.ow 

.0*579 

.0*333 

.0n90 

.OW 

,0*699 


.0*182 

6 

.0*987 

.0*M0 


.0*149 

.0**777 

,0**402 

.0*<206 

.0**104 

,0**523 

.0**260 


♦ Bm mte to Appondh: 

IVpin TaUm 0 / Ar««, in Two TctHi and in, One Tail of the Mormal Curves by Frodeiick E. 
Croxtop. Copyright* 1949, by Prentioe^Eftti, Ijw, Fcrmisaiott 'is given to repsodtice this table 
provided- eredlt ia given to the anther and provided the Prentice-Hall copyright line is included. 
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*2 

.0 

.CK)137 

.0*267 

*0»i99 

.0*665 

,0*967 

.0*171 

.oni6 

.0»29$ 


.0«318 .0»216 .0*145 .0*962 
.0*422 .OW .0*159 .0*958 
.0*214 .0*120 .0*663 .0*384 
.0>*803l .0**208 .0**105 .0>'620 


• S«e note to Appendix E. 

From Tables of Afeas in Two Tails and in One Tail of ike Mormat Curve, by Frederick S, 
Oroxton. Copyright, 1S49, by Frentioe-Hall, Inc, Permisdoii is given to reproduce this table 
provided credit is given to the author and provided the Prentice-Hall copyright lim h Included* 
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APPENDIX 


Values 

For Given Begrees of Freedom (n) and 


This table shows the black 


Level of significance (P) 


n 

.90 

.80 

.70 

.60 

.50 


.30 

.25 

1 

.158 

.325 


.727 

■KiMii 

1.376 

1.963 

2.414 

2 

.142 

.289 

.445 

.617 

.816 


1.386 

1.604 

8 

.137 

.277 

.424 

.584 

.765 

.978 


1.423 

4 

.134 

.271 

.414 

.669 

.741 

.941 


1.344 

5 

.132 

.267 


.659 

.727 


1.156 

1.301 

6 

.131 

.265 

.404 

.653 

.718 

.906 

1.134 

1.273 

7 

.130 

.263 


.549 

.711 

.896 

1.119 

1.254 

8 

.130 

.262 

.399 

.646 

.706 

.889 

1.108 

1.240 

9 

.129 

.261 

.398 

.543 

.703 

.883 


1.230 

10 

.129 

.260 

,397 

.542 


.879 

1.093 

1.221 

11 

.129 

.260 

.396 

.540 

.697 

.876 

1.088 

1.214 

12 

.128 

.259 

.395 

.539 

.695 

.873 

1.083 

1.209 

13 

.128 

.259 

.394 

.538 

.694 


1.079 

1.204 

14 

.128 

.258 

.393 

.537 

.692 

.868 

1 076 


15 

.128 

.258 

.393 

.536 

.691 


1.074 

1.197 

16 

.128 

.258 

.392 

.635 

.690 

.865 

1.071 

1.194 

17 

.128 

.257 

.392 

.534 

.689 

.863 

1.069 

1.191 

18 

.127 

.257 

.392 

.634 

.688 

.862 

1.067 

1.189 

19 

.127 

.257 

.391 


.688 

.861 

1.066 

1.187 

201 

.127: 

.257 

.391 


.687 

.860 

1.064 

1.185 

21 

.127 

.257 

,391 


.686 

.859 

1.063 

1.183 

22 

.127 

.256 

,390 


.686 

.858 

1.061 

1.182 

23 

.127 

.256 

.390 


.685 

.858 

1.060 


24 

.127 

.256 

,390 

.531 

.685 

.857 

1.059 

1.179 

25 

.127 

.256 

.390 

.531 

.684 

.856 

1.058 

1.178 

261 

.127 

.256 


.531 

.684 

.856 

1.058 

1.177 

27' 

.127 

.256 

.389 

.531 

.684 

.855 

1.057 

1.176 

28 

.127 

.256 

.389 

.630 

.683 

.855 

1.056 

1.175 

29' 

.127 

.266 

.389 

.530 

.683 

.854 

1.055 

1.174 

80 i 

.127 

.256 


.530 

.683^ 

.854 

1.055 

1.173 


.126 

.255 

.388 

.529 

.681 

.851 

1.050 

1.167 



.254 

.387 

.527 

.679: 

.848 

1.046 

1.162 


.126 

.254 

.386 

.526 

.6771 

.845 

1.041 

1.156 

OO 

.126 

.253 

.385 

.524 

.674 

.842: 

1.0361 



The values in this table -were taken^ by permission, from Biatistical 
Tables for Biolopicalt Agricultural, and Medical Research, by R. A. Fisher 
and P* Yates, published by Oliver and Boyd, Edinburgh, and from JBio* 
metfika, VoL XXXII, April 1942, p, 300, ‘’Table of Percentage Points of 
the l-diatrjbution,'" by Maaeine Merrington* A table of f, similar in 

m 



















I 


of t 



arrangement to that of Api^endix E, giving areas of the t distribution from the 
mean to I (in one direction) and for n «* 1 to n « 20 may be found in “New 
Tables for Testing the Significance of Observarions,"' by “ Student/* Memn* Vd. 
No. 8 (1025), pages H4-lia 
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APPENDIX 


Values 

For Given Degrees of Freedom 


This table shows 
the black area: 


for sa 1 and n ~ 2, 













J 

of X® 

(w) and for Specified Values of F 



for n ^ 3. 


Value of P 


.30 

^ .26 

! .20 

.10 

, .05 

.025 

.02 

.01 

.005 

b 

o 

t-* 

n 

1.074 

1.323 

1.642 

2.706 

3.841 

5.024 

6.412 

6.636 

7.879 

10.827 

1 

2.408 

2,773 

1 3.219 

4.605 

5.991 

7.378 

1 7.824 

9.210 

10.697 

13.816 

2 

3.665 

4.108 

4.642 

6.251 

7.816 

9,348 

9.837 

11.345 

12.838 

16.268 

, 3 

4.878 

5.385 

i 5.989 

7.779 

9.488 

11.143 

11.668 

13.277 

14.860 

1 18.465 

4 

6.064 

6.626 

7.289 

9.236 

11.070 

12.832 

13.388 

1 16.086 

16.760 

1 20.617 

6 

7*231 

7,841 

^ 8.658 

10.645 

12,592 

14.449 

15.033 

1 16.812 

’ 18.548 

22.457 

6 

8.883 

9,037 

9,803 

12.017 

14.067 

18.013 

16.622 

1 18.475 

’ 20.278 

24.322 

7 

9.524 

10.219 

11.030 

13.362 

15.507 

17.535 

18.168 

20.090 

1 21.956 

26.125 

$ 

10.666 

I 11.389 

12.242 

14.684 

16.919 

i 19.023 

19.679 

21.666 

1 23.589 

27.877 

0 

11.781 

12.649 

13.442 

15.987 

18.307 

1 20.483 

21.161 

23.209 

25.188 

29.588 


12.899 

13.701 

14.631 

17.275 

19.676 

21.920 

22.618 

24.725 

26.767 

31.264 

1 11 

14.011 

14.846 

15.812 

18.549 

21.026 

23.337 

24.054 

26.217 

28.300 

32.909 

12 

15.119 

16.984 

16.985 

19.812 

22.862 

24.736 

25.472 

27.688 

29.819 

34.528 

13 

16.222 

17.117 

18.151 

21.0641 

23.685 

26.119 

26.873 

29,141 

31.319 

36.123 

14 

17.322 

18.245 

19.811 

22.307 1 

24.996 

27.488 

28.259 

80.678 

32,801 

87.697 

16 

18.418 

19.369 

20.465 

23.542 

26.296 

28.8415 

29.633 

32.000 

34,267 

39.252 

16 

19.511 

20.489 

21.615 

24.7691 

27.587 

30.191 

30.995 

33.409 

35.718 

40.790 

17 

20.601 

21.606 

22.760 

25.9891 

28.869 

31.526 

32.346 

34.805 

37.166 

42.312 

18 

21.689 

22,718 

23.900 

27.204’ 

30.144 

32.852 

33.687 

36.191 

38.682 

43.820 

19 

22.776 

23.828 

25.038 

28.412 

31.410 

34,170 

36.020 

37.666; 

39.997 

45.315 

20 

23.858 

24.935 

26. 171 

29.615 

32.671 

35.479 

30.3431 

38.932 

41.401 

46.797 ; 

21 

24.939 

26.039 

27.301 

30.813 

33.924 

36.781 

37.659 

40,289 

42.796 

48.268 

22 

26.018 

27.141 

28.429 

32.007 

35.172 

38.070 

38.968 

41.638 

44.181 

49.728 

23 

27.096 

28.241 

29.653 

33,196 

36.415 

39.364 

40.270 

42.980 

45.558 

51.179 

24 

28.172 

29.339 

30,676 

34.382 

37.652 1 

40.646 

41.566 

44.314 

48.928 

52,620 

26 

29.246 

30.434 

31.795 

35.563 

38.886 1 

41.923 

42.856 

45.642 

48.290 

54.052 

26 

30.319 

31.628 

32.912 

36.741 

40,1131 

43.194 

44.140 

46.963 

49.646 

55.476 

27 

31.391 

32.620 

34,027 

37.916 

41.337 

44.461 

45.419 

48.278 

50,993 

66.893 

28 

32.461 

33.711 

35,139 

39.087 

42.557 

45.722 

46 693 

49.588 

52.336 

68.302 

29 

33.530 

34.800 

36.250 

40.256 

43.773 

46.979 

•47.962 

50.892 

63.672 

69.703 

30 


This table is takea by consent from Table IV of Statiaiical Tables for Biological, Agricultural, 
and Medical Bemarch, by E. A. Fisher and F. Yates, published by Oliver and Boyd, Edinburgh; 
from Biometfika, VoL 32* pp. 187-191, “Table of Percentage Points of the x* Distribution, “ by 
Catherine M. Thompson; and from Biomeirika, Vol. 40, p. 421, “99*9 and 0.1 % Points of the x* 
Distribution,** by T* Dewis, The values shown in Miss Thompson's table (and the values at 
the 0,001 point as well) may also be found in E. S* Pearson and H. 0* Hartley, Biometriha Tablm 
for StaUatieiam, Volume I, Cambridge University Press, Cambridge, 1954, pp. 130-131. 
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APPENDIX 


Values of for Use in Determining 


This table shows 
the black areas: 


n 


Lower points 

.50 

.001 

.005 

,01 

.025 

.06 

.10 

.25 

1 

.0 U 57 

.0*3927 

.0 n 571 

.039821 

.003932 

.01579 

.1015 

.4549 

2 

.001000 

,005013 

.01005 

.02532 

.05129 

.1054 

.2877 

.6931 

3 

.008099 

.02391 

.03828 

.07193 

.1173 

.1948 

.4042 

.7887 

4 

.02270 

.05175 

.07428 

.1211 

.1777 

.2659 

.4806 

.8392 

& 

.04204 

.08235 

.1109 

.1662 

.2291 

.3221 

.6349 

.8703 

6 

.06351 

.1126 

.1453 

.2062 

.2726 

.3674 

.5758 

.8914 

7 

.08550 

.1413 

.1770 

.2414 

.3096 

.4047 

.6078 

.9065 

8 

.1071 

.1681 

.2058 

.2725 

,3416 

.4362 

.6338 

.9180 

9 

.1280 

,1928 

.2320 

.3000 

.3695 

.4631 

,6554 

.9270 

10 

.1479 

.2156 

.2558 

.3247 

.3940 

.4865 

.6737 

.9342 

n 

.1667 

.2367 

.2776 

.3469 

.4159 

.6071 

.6895 

.9401 

.12 

.1845 

.2562 

.2975 

.3670 

.4355 

.6253 

.7032 

.9460 

13 

.2013 

.2742 

.3159 

.3853 

.4532 

.6417 

:7153 

.9492 

14 

.2172 

.2910 

.3329 

.4021 

.4693 

.6564 

.7261 

.9528 

15 

.2322 

.3067 

.3486 

,4175 

.4841 

.6698 

.7358 

.9569 

16 

.2464 

.3214 

.3633 

.4317 

.4976 

.5820 

.7445 

.9587 

17 

.2598 

.3351 

.3769 

.4450 

.6101 

.5932 

.7525 

.9611 

18 

.2725 

.3480 

.3897 

.4573 

.5217 

.6036 

.7597 

,9632 

19 

.2846 

.3602 

.4017 

,4688 

.6325 

.6132 

.7664 

.9661 

20 

.2961 

.3717 

.4130 

,4755 

,6425 

.6221 

.7726 

.9669 

21 

.3070 

.3826 

.4237 

.4897 

.6520 

.6305 

.7783 

.9684 

22 

.3174 

.3929 

.4337 

.4992 

.6608 

: .6382 

.7836 

.9699 

23 

.3274 

.4026 

,4433 

.5082 

.6692 

.6456 

.7886 

.9712 

24 

.3369 ! 

.4119 

.4524 

.5167 

.5770 

.6524 

.7932 

.9724 

25 

.3460 

.4208 

.4610 

.5248 

.5845 

.6689 

,7976 

.9735 

26 

.3547 

.4292 

,4692 

.5325 

.6915 

.6651 

.8017 

.9745 

27 

.3631 

.4373 

.4770 

.5398 

.6982 

! .6709 

,8055 

.9754 

28 

.3711 

.4450 

.4845 

.5467 

.6046 

1 .6764 

.8092 

.9763 

29 

.3788 

.4525 

.4916 

.5533 

.6106 

I .6816 

.8126 

.9771 

30 

.3863 

.4596 

.4984 

.5697 

.6164 

.6866 

.8159 

.9779 

40 

.4479 

.5177 

.5541 

.6108 

.6627 

.7263 

.8415 

.9834 

50 

.4935 

.5598 

,5941 

.6471 

,6953 

,7538 

.8588 

.9867 

60 

.5290 

.5922 

,6247 

.6747 

.7198 

.7743 

.8716 

.9889 

70 

.5577 

.6182 

.6492 

.6965 

.7391 

.7904 

.8814 

.9905 

80 

.5815 

.6396 

.6692 

.7144 

.7549 

.8036 

.8893 

.9917 

90 

.6017 

.6577 

,6862 

,7294 

.7681 

.8143 

,8958 

.9926 

100 

.6192 

,6733 

.7006 

.7422 

.7793 

.8236 

.9013 

.9933 

«e> 

1.0000 

1,0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

1.0000 

<r 

-3.0902 

-2.5758 

-2.3263 

-1.9600 

-1.6449 

-1.2816 

- .6746 

0 


♦ 'Whm n > 3D, values of -- may be approximated by use of the expression 





K 

Sampling Limits of 


and 


Upper pointa 


.25 

1HD9IH 

.06 

.025 

.01 

■■BSSSBI 

.001 

n 

1.323 


3.841 

5.024 

6.635 

7.879 

10,827 

1 

1.3 d 6 

2.303 

2.996 

3.689 

4.605 

5.298 

6.908 

2 

1.369 

2.084 

2.605 

3.116 

3.782 

4.279 

5 423 

3 

1.346 

1.945 

2.372 

2.786 

3.319 

3.715 

4.616 

4 

1.325 

1.847 

2.214 

2.566 

3.017 

3.350 

4.103 

5 

1.307 

1.774 

2.099 

2.408 

2.802 

3,091 

3.743 

6 

1.291 

1.717 

2.010 

2.288 

2.639 

2.897 

3.475 

7 

1.277 

1.670 

1.938 

2.192 

2.511 

2.744 

3.266 

8 

1.265 

1.632 

1.880 

2.114 

2.407 

2.621 

3.097 

9 

1.255 

1.599 

1.831 

2.048 

2.321 

2.519 

2.959 

10 


1.570 

1.789 

1,993 

2.248 

2.432 

2.842 

11 


1.548 

1.752 

1,945 

2.185 

2.358 

2.742 

12 


1,524 

1.720 

1.903 

2,130 

2.294 

2.656 

13 


1.505 

1.692 


2.082 

2.237 

2 580 

14 


1.487 

1.666 

1.833 

2.039 

2,187 

2.513 

15 

1.211 

1.471 

1.644 


2.000 

2.142 

2.453 

16 



1.623 

1.776 

1.965 

2.101 

2.399 

17 



1.604 

1.751 

1.934 

2.064 

2.351 

18 

FI 


1,586 

1.729 

1.905 

2.031 

2.306 

19 

1.191 


1.571 

1.708 

1.878 

2.000 

2,266 

20 

1.187 

1.410 

1.556 

1.689 

1.854 

1,971 

2.228 

21 

1.184 

1.401 

1.542 

1.672 

1.831 

1.945 

2,194 

22 

1.180 

1.392 

1.529 



1.921 

2.162 

23 

1.177 

1.383 

1.517 

1.640 

1.791 

1.898 

2.132 

24 

1.174 

1.375 

1.506 

1.626 

1.773 

1.877 

2.105 

26 

i.m 

1.368 

1,496 

1.612 

1.755 

1.857 

2,079 

26 

1.168 

1.361 

1.486 


1.739 

1.839 

2.055 

27 

1.165 

1.354 

1.476 

1.588 

1,724 

1.821 

2.032 

28 

1.162 

1.348 

1.467 

1.577 

1.710 

1.805 


1 29 

1.160 

1.342 

1.459 

1.566 

1.696 

1.789 



1,140 

1.295 

1.394 

1,484 

1,592 

1.669 

1.835 


1.127 

1.263 

1.350 

1.428 


1.590 

1.733 

50 

1.116 

1.240 

1.318 

1.388 


1.533 

1.660 


1.108 

1.222 

1.293 

1.357 

1.435 

1.489 

1.605 

HI 

1.102 

1.207 

1,273 

1.333 

1.404 

1.454 

1.560 

80 

1.096 

1,195 

1.257 

1.313 


1.426 

1.525 

Mil 

1.091 

1.185 

1.243 

1.296 


1.402 

1.494 

BfifiB 

1.000 


1.000 

1.000 


1.000 

1,000 

«0 

4 * .6745 

+ 1.2816 

+ 1.6449 

+1.9600 

+2.3263 

+2.6758 

+3.0902 

0 


' as 

w&ere ^ i» the normal deviate cutting off the corresponding tail of a nomal distribution. 

The values in this table were computed from values of x* given in the references mentioned in Appendijc Ji 
A y? 

by use of the expression fft m — a*. 
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APPENDIX 



1 . _ ^ 

.001 

.005 

.01 

.0924 

.1269 

.1607 

.1448 

.1887 

.2171 

.1844 

.2337 

.2644 

.2166 

.2692 

,8013 

.2437 

.2985 

.3314 

2672 

.3235 

.3569 

.2878 

.3452 

.3789 

.3062 

.3644 

.3982 

.3228 

.8815 

.4154 

.3380 

.3970 

.4309 

.3518 

.4111 

.4449 

.3646 

.4240 

.4577 

.3765 

.4360 

.4695 

.3876 

.4470 

.4804 

.3979 

.4573 

.4906 

.4076 

.4669 

.5000 

.4168 

.4759 

.5088 

.4254 

.4844 

.6172 

.4336 

.4925 

.5250 

.4414 

.5000 

.5324 

.4487 

.5072 

.6394 

.4558 

.5141 

.5460 

.4625 

.5206 

.6524 

.4689 

.5268 

.6584 

.4751 

.5327 

.6642 

.4810 

.5384 

.5697 

.4867 1 

.5439 

.5749 

.4922 : 

.5491 

.5800 

.4974 

.5542 

.6848 

.5025 

.5590 

.5895 

.5449 

.6991 

.6280 

.5770 

.6290 

.6566 

.6024 

.6525 ; 

.6789 

.6232 

.6717 

.6970 

.6408 

.6878 1 

.7122 

.6559 

.7015 

.7261 

.6691 

.7134 

.7363 

1.0000 

1.0000 

1.0000 

- f 3 0902 

+2.5758 

+2.3263 


Lower limits 

.025 .05 

. 1990 . 2603 

.2711 .3338 

.3209 .3839 

.3590 .4216 

.3896 .4517 


4 * 1.2816 4 " ,6745 


* When n > 30. values of may be approximated by use of the expression 


/on - 2 4 










L 

Confidence Limits of 




X 

where - is the corresponding normal deviate, 
cr 

The values in this table were computed from values of x* the references mentioned in Appendix J, 

by use of the expression O'* ^ 


m 










APPENDIX M 
Values of F 

For Given Degrees of Freedom (iti andl its) and at Selected Upper Points 

Values of F for corresponding lower points may be obtained by transposing the 

. 1 

values of wi and ni and computing y 




for ni 
and ni 


for 

ni ^ 3 , 





m * 

1 




m « 

2 


ns 

MRii 

.05 

WEm\ 

.01 

.001 

.10 _ 

.05 

.025 

.01 

.001 

1 

39.864 

161.45 

647.79 

4,052.2 

405,284 



799 50 

4,999.6 


2 

8-526 

18.513 

38.506 


998. 5 



WVKiItM 


999.0 

3 

5.538 

10. 128 

17.443 

34.116 

167.0 

5.462 

9.552 



148.5 

4 

4,545 

7.709 

12.218 

21.198 

74.14 

4.325 

6.944 



61.25 

5 

4.060 

6.608 

10.007 

16.258 

47.18 


6.786 

8.434 

13.274 

37.12 

8 

3.776 

5.987 

8.813 

13.745 

35.51 

3.463 

5.143 


10.925 


7 

3.5S9 

5.591 

8.073 

12.246 

29.25 

3.257 

4.737 

6.542 

9.647 

21.69 

8 

3.458 

5.318 

7.571 

11.259 

25.42 

3.113 

4.459 

6.060 

8.649 

18.49 

9 

3.360 

5.117 

7.209 

10.561 

22.86 


4.256 

5.715 


16.39 

10 

3.285 

4.966 

6.937 


21.04 



5.456 

7.559 

14.91 

U 

3.225 

4.844 

6.724 

9.646 

19.69 


3.982 

5.256 

7.206 

13.81 

12 

3.176 

4.747 

6.554 


18.64 

2.807 

3.885 

5.098 

6.927 

12.97 

13 

3.136 

4.667 

6.414 


U.si 

2.763 

3.806 

4.965 

6.701 

12.31 

14 

3.102 

4.600 

6.298 

8.862 

17.14 

2.726 

3.739 

4.857 

6.515 

11.78 

15 

3.073 

4.543 

6.200 

8.683 

16.59 

2.695 

3.682 

4.766 

6.359 

11.34 

16 

3.048 

4 494 

6.115 

8.531 

16.12 

2.668 

3.634 

4,687 

6.226 

10.97 

17 

3.026 

4.451 

6.042 


15.72 

2.645 

3.592 

4.619 

6.112 

10.66 

18 

3.007 

4.414 

5.978 

8.285 

15.38 

2.624 

3,555 

4.560 

6.013 

10.39 

19 

2.990 

4.381 

5.922 

8.185 

15.08 

2.606 

3.522 

4.508 

5.926 

10.16 

20 

2.975 


5.872 

8.096 

14.82 

2.589 

3.493 

4.461 

5.849 

9.96 

21 

2.961 


5.827 


14.59 

2.575 

3.467 

4.420 

5.780 

9.77 

22 

2.949 

4.301 

5.786 

7,945 

14.38 

2.561 

3.443 

4.383 

5.719 

9.61 

23 

2.937: 

4.279 

5.750 

7.881 

14.19 

2.549 

3.422 

4.349 

5.664 

9.47 

24 



5.717 

7.823 

14.03 

2.538 

KKl] 

4.319 

5.614 

9.34 

25 

2.918 

4.242 

6.686 

7.770 

13.88 

2,528 

3.385 

4.291 

5.668 

9.22 

26 

2,909 

4.225 

5.659 

7.721 

13.74 

2.519 

3.369 

4.266 

5.526 

9.12 

27 



5.633 

7.677 

13.61 

2.511 

3.354 

4.242 

5.488 

9.02 

28 

2.894 

4.196 

6.610 

7.636 

13.50 


3.340 

4.220 

5.453 

8.93 


2.887 

4.183 

6.588 

7.598 

13,89 

2.496 

8.328 

4.201 

5.421 

8.85 

30 

2.881 

4.171 

6.568 

7.663 

13.29 

2.489 

8.316 

4.182 

5.390 

8.77 

40 

2.835 


6.424 

7.314 

12.81 

\wKM 

3.232 

4,051 

5.178 

8.25 

60 

2.791 


6,286 


11.97 

2.393 

8.150 

3,925 

4.977 

7,76 

120 

2.748 

3.920 

6,152 

6.851 

11.38 

2.347 

3.072 

3.805 

4.786 

7.32 

m 


3.841 

6.024 

6.635 

10.83 

m9JM 

2.996 

3.689 

4.605 

6.91 


Valves of y at the 040, 0,05, 0.025, and OJi points were taken, by permission, from Bioimtrika^ VoL XXXIII, 
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APPENDIX Mr-Continued 


Values of *jP 

For Given Degrees of Freedom (nj and 112) and at Selected Upper Points 

Values of F for corresponding lower points may be obtained by transposing the 

. 1 

values of ni and n2 and computing 





Wl « 

3 




rt, m 

4 


nt 

.10 

.05 

i^i 

.01 1 

.001 

mm 

■H 

.025 

.01 


1 

53.593 

215,71 

864.16 

mmm 

540,379 

55 833 

224 58 

899.58 

5.624.6 


2 

9.162 

19,164 

39.165 

99.166 

999.2 

9.243 

19.247 

39.248 

99.249 

999.2 

3 

5.391 

9.277 

15.439 

29.457 

141.1 

6.343 

9.117 

15. 101 

28.710 

137.1 

4 

4.191 

6.591 

9.979 

16,694 

56.18 

4.107 

6.388 


15.977 

53.44 

5 

3.620 

5.410 

7.764 


33.20 

3.520 

5.192 

7.386 

11.392 

31.09 

6 

3.289 

4.757 

6.599 

9.779 

23.70 

3.181 

4.534 

6.227 

9.148 

21.02 

7 

3.074 

4.347 

5 890 

S .451 

18.77 

2.960 

4.120 

5.523 

7.847 

17.19 

8 

2.924 

4.068 

5.416 

7.591 

15.83 

2.806 

3.838 

5.063 


14.39 

9 

2.813 

3.863 

5.078 

6.992 

13.90 

2.693 

3.633 

4.718 

6.422 

12.56 

10 

2.728 

3.708 

4.826 

6.562 

12.55 

2.605 

3.478 

4.468 

5.994 

1 L 28 

n 

2.660 

3.587 

4.630 

6.217 

11.56 

2.536 

3.357 

4.275 

5.668 

10.35 

12 

2.606 

3.490 

4.474 

5.953 

10.80 

2.480 

3.259 

4.121 

5.412 

9.63 

13 

2.560 

3.410 

4.347 

5.739 

10.21 

2.434 

3.179 

3.996 

6.205 

9.07 

14 

2.522 

3.344 

4.242 

5.564 

9.73 

2.395 

3.112 

3.892 

5.035 

8.62 

15 

2,490 

3.287 

4.153 

6.4 X 7 

9.34 

2.361 

’ 3. C 56 

3.804 

4.893 

3.25 

16 

2.462 

3,239 

4.077 

5,292 

Voo 

2.333 

3.007 

3.729 

4.773 

7.94 

17 

2.437 

3.197 

4.0 U 

5.185 

8.73 

2.308 

KIMJ 


4.669 

7.68 

18 

2.416 

3.160 

3.954 


8.49 

2,236 

2.928 


4.679 

7.46 

19 

2.397 

3.127 

3.903 


8.28 

2.266 

2.895 

3.559 


7.26 

20 

2.380 

3,098 

3.859 

4.938 

8 . 1c 

2.249 

2.866 

3.515 

4.431 

7.10 

21 

2,365 

3.072 

3.819 

4.874 

7.94 i 

2,233 


3.475 

4.369 

8.95 

22 

2.351 

8.049 

3.783 

4.817 

7.80: 

2,219 

2.817 

3.440 

4.313 

8.81 

23 

2.339 

3.028 

3.750; 

4.765 

7.67 


2.795 

3.408 

4.264 

6,69 

24 

2.327 

3.009 

3.721 

4.718 

7.55 

2.195 

2.776 

3.379 

4.218 

6.59 

25 

2.317 

2.991 

3,694 

4.676 

7.45 

2.184 

2.759 


4.177 

8,49 

26 

2,308 

2.975 

3.670 

4.637 

7.36 

2.174 

2 743 

3.329 


6.41 

27 

2.299 

2.960 

3.647 


7.27 

2.166 

2.728 


4.106 

6.33 

28 

2.291 

2.947 

3.626 

4.568 

7.19 

2.157 

2.714 

3.286 

4.074 

6.25 

29 

2.283 

2.934 

3.607 

4,638 

7.12 

2.149 

2.701 

3.267 

4,045 

0.19 

30 

2.276 

2,922 

3.589 

4.510 

7.05 

2.142 

2.690 


4.018 

6.12 

40 

2.226 

2.839 

3.463 

4.313 

6.60 

2.091 

2.606 

3.126 

3.82 S 

5.70 

60 

2.177 

2.758 

3.342 

4.126 

6,17 


2. S 25 


3.649 

5.31 

120 

2.130 

2.680 

3.227 

3,949 

5.79 

1.992 

2.447 

2.894 

3.480 

4.95 

<0 

2.084 

2.605 

3.11® 

3.782 

5.42 

1.945 

2.372 

2.786 

3.319 

4.62 


April 1943, pp* 73-78, **TftMes of Percentage Pomts oHhe Inverted Beta {f) Distribution;* by Maxine Herrington 
and Catherine M. Thompson. ¥ato of f at the 0.001 point were taken from Table V of E, A. Pfeher and P. Yates. 
BUtutiml Tables far Biobgimh AmcvUnmlt and Medical Research Oliver and Boyd, htd., Edinburgh, 1949, by 
permission of the anthers and pnblishers. The tables which originally appeared In Blomdrite may be found also 
. in B, 8. Pearson and H. O. Hartley, Biomdrikt TahUs for StaUstimm, Volume I, Cambridge Univer^ty Press, 
Cftinbddge, 1954, pp. 157-163. This source provided fourteen correctiems for the values at the 0.001 point* 
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APPENDIX Mr— Continued 
Values of F 

•Given Degrees of Freedom (m and 112) and at Selected Upper Points 
Values of F for corresponding lower points may be obtained by transposing the 
values of ni and riz and computing -;• 





ni « 

•5 





6 



BQi 


■Ol 


.001 

.10 

.05 

.025 

.01 1 

,001 

1 

57.241 

mSMl 

921.85 

5,763.7 

576,405 

58.204 

233.99 

937.11 


686,937 

2 

9.293 

19.298 

39.298 

99.299 

999.3 

9.326 

19.330 

39.331 

99.332 

999.8 

3 

5.309 

9.014 

14.885 

28.237 

134.6 

5.285 

8.941 

14.735 

27.911 

132.8 

4 

EltMli 

6.256 

9.364 

15.522 

61.71 

mmm 

6.163 

9.197 

15.207 

50.53 

S 

3.453 

5.050 

7.146 


29.75 

3.404 

4.950 

6.978 

10.672 

28.84 

6 

3.108 

4.387 

5.988 

8.746 

20.81 


4.284 

Bin 

8.466 

20.03 

7 

2.883! 

3.972 

5.285 


16.21 

2.827 

3.866 

5.119 

7.191 

15.52 

8 

wmrm\\ 


4,817 

6.632 

13.49 

2.668 

3.581 

4.652 

6,371 

12.86 

9 

2.611 

3.482 

4.484 

6.057 

11.71 

2.551 

3.374 


5.802 

11.13 

10 

2,622 

3.326 

4.236 

5.636 

10.48 

2.461 

3.217 


5.386 

9.92 

n 

2.451 

3.204 

4.044 

5.316 

9.68 

2.389 


3.881 


9.05 

1 % 

2.394 

3.100 

3.891 

5.064 


2.331 

2.996 

3.728 

4.821 

8.38 

13 

2.347 

3.025 

3.767 

4.862 

8.35 

2.283 

2.915 

3.604 


7.86 

14 

mmn 


3.663 

4.695 

7.92 

2.243 

2.848 

hmi 

4.456 

7.43 

15 

2.273 

2.901 

3.576 

4.656 

7.67 

^2 208 

2,790 

3.416 

4.318 

7.09 

16 

2,244 

2.852 

3.502 

4.437 

7.27 

2.178 

2.741 

3.341 


6.81 

17 

2.218 

2.810 

3.438 

4.336 

7.02 

2.152 

2.699 

3.277 

4.102 

6.68 

18 

2.196 

2.773 

3.382 

4.248 

6.81 

msm 

2.661 

3.221 

4.016 

6.35 

19 

2.176 

2.740 

3.333 

4.171 

6.62 

2.109 

2.628 

3.172 

3.939 

6.18 

20 

2.158 

2.711 

3.289 

4.103 

6.46 

2.091 

2.599 

3.128 

3.871 

6.02 

21 

2,142 

2.685 

3.250 

4.042 

6.32 


2,573 

3,090 

8.812 

5,88 

22 

2.128 

2.661 

3.215 


6.19 

2.060 


3.065 

8.758 

5.76 

23 

2.115 

2.640 

3.184 

3.939 

6.08 

2.047 

2.528 

3.023 

8.710 

5,66 

24 

2.103 

2.621 

3.155 

3.895 

5.98 

2.035 


2.095 

8.667 

5.55 

25 

2.092 


3.129 

3.855 

5,88 

2.024 

2,490 

2.969 

3.627 

5.46 

26 

HBil 

2.587 

3.105 

3.818 

5.80 

2.014 

2.474 

2.945 

3.591 

5.38 

27 

2.073 

2.572 

3.083 

3.785 

5.73 

mm 

2.459 

2.923 

3.558 

5.81 

28 

2,064 

2,558 

3.082 

3.754 


1.996 

2.445 

2.903 

3.528 

5,24 

29 

2.057 

2.545 

3.044 

3.725 

5.59 

1.988 

2.432 

2.884 

3.499 

5,18 

30 

2.049 



3.699 

5.53 


2.421 

2.867 

8.474 

5.12 

40 

1,997 


BH 

3.514 

5.13 

1.927 

2.336 

2,744 

3.291 

4.73 


1,946 

2.368 


3.339 

4.76 

1.875 

2.264 

2.627 

3.119 

4.37 

120 

L 896 

2.290 

2,674 

3.174 

4.42 

1.824 

2.176 

2.615 

2.966 

4,04 

m 

1.847 

2,214 

2.666 

3.017 

4.10 

1.774 



2.802 

3.74 


760 
































APPENDIX M — Continued 

Values of F 

For Given Degrees of Freedom (m and rin) and at Selected Upper Point® 
Values of F for corresponding lower points may foe obtained foy transposing tlie 
values of tix and nz and computing -• 



. 


n, « 

8 


n ,-12 

n* 

MM 

,05 

mm 


.001 

.10 

.05 

.025 

.01 

.001 

1 

59.439 

238.88 

956.68 

5,981.6 

598.144 


243.91 

076.71 

6,106.3 

610,667 

2 

9.387 

19.371 

39.373 

99.374 

999.4 


19.413 

39.416 

99.416 

999.4 

3 

5.252 

8.845 

14.540 

27.489 

130.8 

5.216 

8.745 

14.337 

27.052 

128.3 

4 

3.955 

6.041 

8.980 

14.799 


3.896 

6.012 

8.75! 

14.374 

47.41 

fi 

3.839 

4.818 

8.767 

10.289 

27.64 

K^ll 

4.678 

6.525 

0.888 

26.42 

6 

2.983 

4.147 


8.102 


2.905 


6.366 

7.718 

17.99 

7 

2.752 

3.726 

4.899 

6.840 

14.63 

2.668 

3.575 

4.668 

8.460 

13.71 

8 

2.689 

3.438 

4.433 


12.04 

rnwm 

3.284 


6.667 

11.19 

0 

2.469 

3,230 


5.467 

10.37 

2.379 

3.073 

3.868 

6 .m 

0.57 

m 

2.377 


3.855 


9.20 

2.284 

2.013 

3.621 

4.706 

8.45 

u 



3,664 

4,745 

8,35 


2.788 


4.397 

7.63 

12 

2.245 

2.849 

3.512 

4.499 

7.71 

2.147 

2.687 

3.277 

4.155 

7.00 

'*3 

2.195 

2,767 

3.388 

4.302 

7.21 

2.097 


3.163 

3.960 

6.52 

14 

2.154 

2.699 

3.285 

4.140 


2.064 

2.534 


3.800 

6.13 

15 

2.118 

2,641 

3.199 

4.004 

6.47 


2.475 

2.063 

3.666 

5.81 

18 

2.088 

2,691 

3.125 

3,890 

6.19 

1.986 

2.425 

2.889 

3.553 

5.55 

17 


2.648 


3.791 


1.958 

2.381 

2.826 

3.455 

5.32 

18 


Kldld 


3.705 

6.70 

■WtKHl 

2.342 

2.789 

3.371 

5.13 

10 

utiiy 

2.477 


3.631 

5.59 

1.012 

2.308 

2.720 

3.296 

4.97 

20 

1.998 

2.447 


3.664 

5.44 

1.892 

2.278 

2,676 

8.231 

4.82 

21 

1.982 

2.421 

2,874 

3.506 

5.31 

1.875 


2.637 

3.173 

4.70 

22 

1.967 

2,397 

2,839 

3.453 

5.19 

1.859 

2.226 

2.602 

8.121 

4.58 

23 

1.953 

2.375 

2.808 

3.408 


1.845 

2.204 

2.570 

3.074 

4.48 

24 

1.941 

2.355 

2.779 

3.363 

4.99 

1.832 

2.183 

2.641 

3.032 

4.39 

25 

1.929 

2.837 

2.753 

3.324 

4.91 

1.820 

2.165 

2.515 

2.093 

4.31 


1,919 

2.321 

2.729 

3.288 

4.83 

mi 

2.148 

2.491 

2,058 

4.24 

27 

1.909 


2.707 

3.258 

4.76 

1.799 

2.132 

2.469 

2.026 

4.17 

28 

1.900 

2.291 


3.226 

4.60 

mWm 

2.118 

2.448 

2.806 

4 ,U 

29 

1.892 

2.278 

2.669 

3.198 

4,84 

1.781 

2.104 

2.430 

2.869 

4.05 

30 

1.884 

2,286 

2.651 

3.173 

4.58 

1.773 

2.092 

2.412 

2.843 

4.00 


IJ 20 

2,180 

2.629 


4.21 

1.716 


2.288 

2.665 

8.64 

80 

1.775 


2.412 

2. S 23 

8.87 

1.657 

1.917 

2.160 

2.496 

8.31 

m 

1.722 

2.018 


2.663 

8.55 

■Ssn 

1.834 


2,338^ 

8.02 

m 

1,670 

1.038 

2.192 

2.511 

3.27 

1.648 

1,752 

1.945 

2.185^ 

2.74 
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APEE]>fDIX Mr-Concluded 
Values of F 

For Given Degrees of Freedon (fi-i and 3^2) and at Selected Upper Points 
Values of F for corresponding lower points may be obtained by transposing the 
values of m and ^2 and computing 






24 




n, « 

CO 


na 

.10 


wmm 


.001 

■■ 

.05 


.01 

.001 

1 



997.25 

6,234.6 

623,497 

63.328 

254.32 

1,018.3 

6,366.0 

83^619 

2 

9,450 

19.454 

39.456 

99.458 

999.5 

9.491 

19 496 

39.49'8 

99.501 

999.5 

3 

5.176 

8.638 

14.124 

26.698 


5.134 

8.527 

13.902 

26.125 

123.5 

4 

3.831 

5.774 

8.511 

13.929 


3.761 

5.628 

8.257 

13.463 

44.05 

5 

3.190 

4.527 

6.278 

9,467 

25.14 

3.105 

4.365 

6.025 

9.020 

23.79 

6 

2.818 

3,841 

5.117 

7.313 

16.89 

2.722 

3.669 

4.849 

6.880 

15.75 

7 

2.575 

3.410 

4.415 

6.074 

12 73 

2.471 

3.230 

4.142 

5.650 

11.70 

8 

2.404 

3.115 

3 947 

5,279 


2.293 

2.928 

3.670 

4.859 

9.33 

9 

2.277 


3.614 

4.729 


2.159 

2,707 

3.333 

4.311 

7.81 

10 

2,178 

2.737 

3.365 

4.327 

7.64 

2.055 

2.538 

3.080 

3.909 

6.76 

11 


2.609 

3.172 

4.021 


1.972 

2.405 

2,883 

3.602 

6.00 

12 

WWm 

2.505 

3.019 


6.25 

1.904 

2.296 

2.725 

3.361 

6.42 

13 

1.983 

2.420 

2.893 

8.587 

5.78 

1.846 

2.206 

2.696 

3.165 

4.97 

14 

1.938 

2.349 

2.789 

3.427 

5.«:i 

1.797 

2.131 

2,487 

3.004 

4.60 

15 

1.899 

2.288 

2.701 

3.294 

5.10 

1.755 

2.066 

2.395 

2.868 

4.51 

16 

1.866 

2.235 

2.625 

3.181 

4.85 

1.718 

2.010 

2.316 

2.753 

4.06 

17 

1.836 

2.190 

2,560 


4.63 

1.686 

1.960 

2.247 

1 2,653 

3.85 

18 

1.810 

2 150 

2.503 

2.999 

4.45 

1,657 

1.917 

2.187 

! 2.666 

3.67 

19 

1.787 


2.452 

2 925 

4.29 

1.631 

1.878 

2.133 

2.489 

3.51 

20 

1,767 

2.083 


2.859 

4.15 

1.607 

1.843 

2.085 

i 2.431 

t 

3.38 

21 

1.748 

2.054 

2.368 

2.801 

4.03 

1.586 

1.812 

2.042 

2.360 


22 

1.731 

2 028 

2.332 

2.749 

3.92 

1.567 

1.783 

2.003 

2.305 

3.15 

23 

1.716 


2.299 


3 82 

1.549 

1.757 

1.968 

2.256 

3 05 

24 

1.702 

1.984 

2.269 

2.659 

3.74 

1.533 

1.733 

1.935 

2.211 

2 97 

25 

1.689 

1.964 

2.242 

2.620 

3.66 

1.518 

1.711 

1,906 

2.169 

2.89 

26 ’ 

1.677 

1.946 

2,217 

2.585 

3.59 

1.504 

1.691 

1.878 

2.132 

2.82 

27 : 

1.666 


2.195 

2.552 

3.52 

1.491 

1.672 

1.853 

2,096 

2.75 

28 

1.656 

1,915 

2.174 

2.522 

3,46 

1.478 

1.654 

1.829 

2.064 


29 

1.646 

1.901 

2.164 

2.495 

3.41 

1.467 

1.638 

1.807 

2.034 

2 64 

30 

1.638 

1.887 

2.136 

2.469 

3.36 

1.456 

1,622 

1.787 

2.Q06 

2.69 


1.574 

1.793 

2.007 

2.288 

3.01 

1.377 

1.509 

1.637 

1.805 



1.511 

1.700 

1.882 

2.115 

2.69 

1.292 

1,389 1 

1.482* 

1.601 


120 

1.447 

1.608 


1.950 

2.40 

1.193 

1,254* 

1.310 

t.aso 


<00 

1.383 

1.517 

1.640 

1.791 

2.13 

1.000 

1.000^ 

1.000 

1.000 



762 


























APPENDIX N 


Values of L at the 0.05 and 0.01 Points for Specified Values 

of Ni and k, when Ni = Ni = • ■ ’ = Nk — Ni , 

1 * ■» 

If L has been computed from samples of varying siy.e, take iV* equal to 
- 1 provided that no sample consists of fewer than 16 or 20 items. 


This table shows 
the black ares: 




N . 

* 3 

AT. 

B 

D 

« 5 

iV , - 6 


« 7 


» 8 

Ni 

» 0 

k 


.01 


.01 


.01 

.05 

.01 

.05 

.01 

.05 


.05 

.01 

~ 

312 

.141 

.478 

.284 

.685 

.398 

.656 

.435 

.70 S 

.551 

.745 


.775 

.645 

% 

.304 

.262 

.470 

.314 

.676 

.429 

.648 

.614 

.700 

.678 

.739 


.769 

.667 

4 

.315 

.iss 

.480 

.345 

.585 

.459 

.656 

,542 

.707 

.604 

.744 

.652 

,774 

.688 

6 

.328 

.210 

.491 

.370 

.595 

.484 

.665 

.565 

.714 

.624 

.751 

.670 

.780 

.706 

3 

.339 

.230 

.502 

.391 

.604 

.504 

.673 

.683 

.721 

.641 

.757 

.685 

.785 

.720 

7 

.350 

.246 

.512 

.409 

,612 

.520 

.680 

.597 

,727 

.654 

.763 

.697 

.790 

.730 

8 

.359 

.260 

.520 

.424 

.620 

.534 

.686 

,610 

.733 

.665 

.768 

.707 

.795 

.740 

9 

.367 

.273 

.527 

.437 

,626 

.645 

.691 

.620 

.?38 

.674 

.772 

.715 

.798 

.747 

10 

.374 

.284 

.534 

.448 

.631 

.555 

.696 

.629 

.742 

.682 

.776 

,722 

.802 

,763 

12 

.387 

.303 

.545 

.467 

.641 

.672 

.704 

.644 

.749 

.696 

.782 

.734 

.807 

.764 

14 

.397 

.318 

.554 

.431 

.649 

.585 

.711 

.655 

.755 

.706 

.787 

.744 

.812 

.773 

16 


.331 

.561 

.493 

.655 

.696 

.716 

.666 

.759 

.714 

.791 

.751 

.816 

.779 

IS 

.412 

.342 

.667 

.504 

.660 i 

.605 

.721 

.672 

.763 

,721 

.795 

.766 

.819 

.784 

20 

.418 

.352 

.573 

.512 

.665 

.613 

.725 

.679 

.767 

.727 

.798 

.761 

.822 

.788 

22 

.424 

.360 

.577 

.520 

.669 

.619 

.728 

.684 

.770 

.732 

.800 

.765 

.824 

.798 

24 

.428 

.367 

.581 

.526 

.672 

.624 

.731 

.688 

772 

.736 

.802 

.768 

.826 

.795 

26 

.433 

.373 

.585 

.532 

.675 

.629 

.734 

,693 

.775 

.740 

.805 ’ 

,772 

.828 

.798 

28 


,379 

.589 

.637 

,678 

.634 

.^^36 

.697 

.777 

.744 

.807 

.776 

.829 

.802 

30 

Ifll 

.386 

.592 

.643 

.681 

.639 

.739 

.703 

.779 

.748 

.809 

.781 

.831 

.805 



iV'f « 10 

Ni 

« 12 


« 15 

IHI 


AT . 

« 60 

■1 

■■I 

■■ 

* 

13 

.01 

.05 

.01 

m 

.01 

.05 

.01 

.06 

.01 

.05 

.01 

.06 

.01 

2 

.798 

.678 

.833 

.730 

.868 

,783 

.902 

.836 

.935 

.890 

.968 

.945 

1.000 

1.000 

3 

.792 

.699 

.828 

.748 

,863 

.798 

.898 

.848 

.933 

.898 

.967 

.949 

1.000 

1.000 

4* 

.797 

.719 

.832 

.765 

.866 

.812 

.900 

.859 

.934 

.906 

.967 

.953 

1.000 

1.000 

6 

.802 

.735 

.836 

.779 

.870 

.823 

.903 

,867 

.936 

.911 

.968 

.956 

1.000 

1.000 

6 

.808 

.748 

.841 

.789 

.873 

•832 

.906 

.874 

.938 

.916 

.969 

.958 

1.000 

1.000 

7 

.812 

,767 

.844 

.798 

,876 

•839 

.908 

,879 

.939 

• 920 

.970 

.960 

1,000 

1.000 

S 

.816 

.766 

.848 

.805 

.879 

.844 

,910 

.884 

.941 

,923 

.971 

.962 

1.000 

1,000 

9 

.819 

.773 

.851 

.811 

.881 

•849 

.912 

.887 

.942 

• 925 

.971 

.963 

1.000 

1.000 

10 

.822 

.779 

.863 

.816 

.883 

.853 

.913 

.890 

.943 

.927 

.972 

.964 

1.000 

1.000 

12 

.828 

.789 

.857 

.824 

.887 

•860 

.916 

.896 

.944 

.931 

,973 

.966 

1.000 

1.000 

14 

.832 

,796 

.861 

.831 

.890 

.865 

.918 

.900 

.946 

.933 

,973 

.967 

1,000 

1.000 

16 

.835 

.802 

.863 

.836 

.892 

,870 

,920 

,903 

.947 

.936 

,974 

• 968 

1.000 

1.000 

IS 

.838 

.807 

.866 

,840 

,894 

.878 

.921 

,905 

.948 

.937 

.974 

*969 

1.000 

1.000 

20 

.840 

.all 

.868 

.844 

.896 

• 876 

,922 

.908 

.949 

• 939 

,.976 

• 970 

1.000 

1.000 

22 

.843 

.814 

.870 

.847 

.897 

.878 

.924 

• 909 

.960 

• 940 

.976 

.970 

1.000 1 

1.000 

24 

,844 

.817 

.872 

,850 

.898 


.924 

•911 

.960 

• 941 

.976 

.971 

1.000 

1.000 

26 

.846 

.820 

.873 

.852 

.899 

^ ' ilf 9 

.925 

.912 

.961 

•943 

.976 

• 971 

1.000 ■ 

1.000 

28 

.848 

.823 

.874 

.854 

,000 

1 "efif B 

.926 

• 914 

.961 

• 943 

,976 

• 972 ^ 

LOGO 

1.000 

30 

.849 

.827 



.901 

iiip 

,927 

.915 

,962 ^ 

.944 

,976 

.972 

1,000 

: i .0 O 0 


Based on a table in Investigation Into the Application of Neyman and Pear- 
son^s Li Test, 'with Tables of Percentage Limits,’' by P. F, N. ISlayer, SlaUsiical 
Research Memoirs, Vol I (1936), pp, 38-61, by permission of the author. An earlier 
table of the same nature is given in ** Tables for the Ajpplication of L-Tests/’ by 
F* 0, Mahalanobis, Bamkhya: The Indian Journal of StatisUcs^ VoL I, Part 1 (June 
1033), pp. 100-122, 
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APPENDIX O 


Upper* 0;10 and 0,02 Limits of iSi 
When Computed from Random 
Samples from a Normal Population 


Thi^ table shows 
the black area: 


N 

0.10 

0.02 

60 

.285 

.619 

75 

.198 

.424 

100 

.152 

,821 

126 

.128 

.258 

160 

.108 

,216 

175 

.089 

.185 

200 

.078 

.162 

260 

.063 

.130 

800 

.053 

.108 

850 

.045 

.093 

400 

.040 

.081 

460 

.035 

.072 

600 

.082 

.065 

650 

.029 

.059 

eoo 

.027 

.054 

650 

^ .025 1 

.060 

700 

.028 1 

.046 

760 

.021 1 

,043 

800 

.020 

.041 

860 

.019 

.038 

900 

.018 

.036 

950 

,017 

.034 

1000 

.016 

.032 

1200 

.018 

.027 

1400 

.012 

.023 

1600 

.010 

,020 

1800 

.009 

.018 

2000 

.008 

.016 

2500 

.006 

.018 

3000 

.005 

.oil 

8600 

.005 

.009 

4000 

,004 

.008 

4500 

.004 

.007 

5000 

.003 

.006 


Taken, by permission, from a table given by Egon S. Pearson in hh 
article '** A Further Development of Testa of Normality/* Btomelrijfea, 
VoJ, XXII,. pages 239 ff* A similar table for *\/pj is given in E, S» 
Pearson and H* O. Hartley, Biomtri&a TtthUs /or StuUiiimns„ 
Volume I, Cambridge University Pres**, Cambridge, 1954, p, 183. 
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APPENDIX P 


Upper and Lower 0.05 and 0.01 
Limits of ^2 When Compht^d from 


Random 

Samples from a Normal 


Population 


This table shows 
the black areas: 

/\ 

and 



iV 

Lower limits | 

Upper limits 

! 0.01 

0.05 

0.05 

0.01 

100 

2.18 

2.35 

3.77 

4,39 

125 

2.24 

2.40 

3.70 

4.24 

150 

2.29 

2.45 

3.65 

4.14 

175 

2,33 

2,48 

3.61 

4.05 

200 

2.37 

2.51 

8.57 

3.98 

250 

2.42 

2.55 

3.52 

3.87 

300 

2.46 

2.59 

3.47 

3.79 

350 

2.50 

2.62 

3.44 

3.72 

400 

2.52 

2.64 

3.41 

3.67 

450 

2.55 

2.66 

3.39 

1 3.63 

600 

2.57 , 

2.67 

3.37 

3,60 

650 

2.58 

2.69 

3.35 

3.57 

600 

2.60 

2.70 

1 3.34 

3.54 

650 

2.61 

2.71 

3.33 

3.62 

700 

2.62 

2.72 

1 3.31 

3.50 

750 

2.64 

2.73 

3.30 

3.48 

800 

2.65 

2.74 

1 3.29 

3.46 

850 

2.66 

2.74 

3.28 

3.45 

900 

2.66 

2.75 

3.28 

3.43 

950 

2.67 

2.76 

3.27 

3.42 

1000 

2.68 

2,76 

3.26 

3.41 

1200 

2.71 

2.78 

3.24 

3.3X 

1400 

2.72 

2.80 

3.22 

3.34 

1600 

2.74 

2.81 

3.21 

3.32 

1800 

2.76 

2.82 

3.20 

3.30 

2000 

2.77 

2.83 

3.18 

3.2s 

2500 

2.79 

2.85 

3.16 

3.25 

3000 i 

2.81 

2,86 

3.15 

3.22 

3500 i 

2.82 

2.87 

3.14 

3.21 

4000 ; 

2.83 ; 

2.88 

3.13 

3.19 

4500 ^ 

2.84 ' 

2,88 

3.12 

*3.18 

6000 1 

2.85 1 

2.89 

3.12 

3.17 


by permissioa, ft&m a table given by Egon S. Pearson in his article “A Further 
0evelopment of Teats of Normality,** Biomeiriku^ Vol. XXIL pages 289 ft, A similar 
table is given in B. S. Pearson and H. O. Hartley, Biometrika TahU» Jor SMistici&m, 
Volume I, Cambridge University Press, Cambridge, 1964, p* 184, 




APPENDIX Q 


Squares, Square Roots, and 
' Reciprocals, 1—1,000 



Square Root 

Recipsocal 

I 1 

1 

1.0000000 

1*000000000 

■ 2 

4 

1.4142136 

0.500000000 

: ^ 

9 

1.7320508 

-3333333331 

^ 4 

16 

2.0000000 

.250000000! 

: 5 

25 

2.23C0GS0 

.200000000 

1 6 

36 

2*4494897 

.1666666671 

7 

49 

2.6457513 

.142857143! 

8 

64 

2.8284271 

.125000000 

9 

81 

3.00000QO 

.111111111 

10 

100 

3.1622777 

.100000000 

11 

121 

3.3166248 

.090909091 

12 

144 

3.4641016 

,083333333 

1$ 

169 

3.6055513 

.076923077 

14 

196 

3.7416574 

.071428571 

15 

2 25 

3.8729833 

-066666667 

16 

256 

4.0000000 

.062500000 

17 

289 

4.1231056 

.058823529 

18 

324 

4.2426407 

.055555556 

19 

361 

4.3588989 

.052631579 

20 

4 00 

4.4721360 

.060000000 

21 

441 

4,5825757 j 

.047619048 

22 

484 

4.69041581 

.045454545 

23 

6 29 

4-7958315 | 

.043478261 

241 

676 

4.S9S9795| 

.041666667 

25^ 

625 

5.0000000^ 

.040000000 

26 

676 

6.0990195! 

.038461538 

27 

7 29 

5.19615241 

.037037037 

28 

784 

5.2915026 

.035714286 

291 

8 41 

6.38516481 

.034482759 

30 

9 00 

5.4772256; 

*033333333 

31 i 

961 

5.5677644 

.032258065 

32 i 

10 24 

5.6568542 

.031250000 

33 

10 89 

5.7445626 

.030303030 

34 1 

11 56 

5.8309519 

-029411765 

35 1 

12 25 

5,9160793 

-028571429 

36 

12 96 

6.0000000 1 

.02777777$ 

37 1 

1369 

6,0827625' 

.027027027 

38 1 

1444 

6.1644140 

.026315789 

39! 

1521 

6.24499S0 

*025641026 

40 i 

1600 

6.3245553 

.025000000 

41 

1681 

6.4031242 

.024390244 

42 

17 64 

6-4807407 

*023809524 

43 

18 49 

6.5574385' 

.023255814 

44 

1936 

6.6332496 

.022727273 

45 

20 25 

6-7082039 

.022222222 

46 

21 16 

6-7823300 

.021739130 

47 

22 09 

6.8556546 

.021276596 

48 

23 04 

6.9282032 

.020833333 

49 

24 01 

7.0000000 

.020408163 

50 

25 m 

7.0710678 

.020000000 


No. Square Square Root Reciprocal 

~51 26 01 7.14142S4 .019607343 

62 27 04 7.2111026 019230769 

53 2809 7.2801099 .018867925 

54 29 16 7.3484692 .018518519 

55 30 25 7.4161985 .018181818 

56 31 36 7.4833148 .017857143 

57 3249 7.5498344 .017543860 

58 33 64 7.6157731 .017241379 

59 3481 7.6811457 .016949153 

60 36 00 7.7459667 .016666667 

61 37 21 7.8102497 .010393443 

62 3844 7.8740079 .016129032 

63 39 69 7.9372539 .015873016 

64 40 96 8.0000000 .01562-5000 

65 42 25 8.0622577 .015384615 

66 43 56 8.1240384 .015151515 

67 44 89 8.1853528 , 014925373 

68 46 24 8.2462113 .014705882 

69 47 61 8.3066239 .014492754 

70 49 00 8.3606003 .014285714 

71 60 41 8.4261498 .014084507 

72 51 84 8.4852S14 .013888889 

73 53 29 8.5440037 .013698630 

74 54 70 8.6023253 ,013513514 

75 66 25 8.6602540 .013333333 

73 67 76 8.7177979 .013157895 

77 69 29 8.7749644 .012987013 

78 60 84 8.8317009 .012820513 

79 6241 8.8881944. .012658228 

80 64 00 8.9442719 .012500000 

SI 6561 9.0000000 .0123456791 

82 67 24 9.0553851 .012195122 

83 68 89 9.1104336 .012048193 

84 7058 9.1651514 .011904762 

85 72 25 9.2I9544S .011764706 

86 73 98 9.2730185 .011627907 

87 75 69 9.3273791 .011494253 

88 77 44 9.3808315 .011303030 

89 79 21 9-4339S11 .011230955 

90 81 00 9.4868330 . 011111111 

91 82 81 9.5393920 .0109S9011 

92 84 64 9-5916630 ,010869565 

93 86 49 9.6436508 0107S2C8S 

94 88 36 9.6953597 .010038298 

95 90 25 9.7467943 010526316 

96 92 16 9.7979590 .010416667 

97 94 09 9.8488578 .010309278 

98 96 04 9-8994949 . 010204082 

99 98 01 9.9498744 . 010101010 

100 1 00 00 lO.OOOOOOO .010000000 
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APPENDIX Q 


No. 

Squmre 

Square Hoot 

Reciprocal 

,00 


m 

Square 

Square Root 

Reciprocal 

.00^ 

101 

102 01 

10.0498756 

9900990 


151 

2 28 01 

12.2882057 

6622517 

102 

104 04 

10.0995049 

9803022 


162 

2 3104 

12.328S2SO 

6578947 

103 

10609 

10.1488916 

9708738 


153 

234 09 

12.3693169 

6535943 

104 

10816 

10.1980300 

9615385 


154 

23716 

12.4096736 

6493506 

105 

no 25 

10.2469508 

9523810 


155 

240 25 

12.4498996 

6451613 

106 

1 12 36 

10.2956301 

9433962 


156 

.24336 

12.4809960 

6410256 

107 

1 14 49 

10.3440S04 

9345794 


167 

^^24649 

12.5290641 

6369427 

108 

1 16 64 

10.3923048 

9259259 


153 

2 49 64 

12.5698051 

6329114 

109 

118 81 

10.4403065 

9X74312 


159 

2 5281 

12.6095202 

62893QS** 

110 

12100 

10.4880885 

9090909 


ICO 

2 56 00 

12.6491106 

6250000 

111 

123 21 

10.6356538 

9009009 


161 

2 59 21 

12.6885775 

6211180 

112 

125 44 

10.5830052 

8928571 


162 

2 62 44 

12.7279221 

6172840 

113 

127 69 

10.6301458 

SS49558 


163 

2 65 69 

12,7671453 

6134969 

114 

129 96 

10.6770783 

8771930 


164 

2G8 96 

12.8062485 

6097561 

115 

13225 

10.7238053 

8695652 


165 

272 25 

12.8452326 

60G0606 

116 

13456 

10.7703296 

8620690 


166 

27556 

12.8840987 

6024096 

117 

13689 

10.8166538 

8547009 


167 

278 89 

12.9228480 

59S8024 

118 

13924 

10.8627SQ5 

8474576 


168 

2 82 24 

12.9614814 

5952381 

119 

141 61 

10.0087121 

8403361 


169 

28561 

13.0000000 

5917160 

120 

144 00 

10.9544512 

8333333 


170 

28000 

13.0384048 

5882353 

121 

14641 

11.0000000 

8264463 


171 

29341 

13.0766968 

5847953 

122 

14584 

11.0453610 

8196721 


172 

29584 

13. 1148770 

5813953 

123 

151 29 

11.0905365 

8130081 


173 

2 99 29 

13.1529464 

6780347 

124 

15576 

11.1355287 

8064516 


174 

30276 

13.1909060 

5747X26 

125 : 

16625 

11.1803309 

8000000 


175 

30625 

13.2287566 

57142S6 

126 1 

1SS76 

U. 2249722 

7936508 


176 

80976 

13,2664992 

5C8IS18 

127 

10129 

11.2694277 

7874016 


177 

31329 

13.3041347 

5649718 

128 ; 

163 84 

11.3137085 ! 

7S12500 


17S 

316^ 

13.34166.41 

5617078 

129 i 

166 41 

1L357S167 i 

7751038 


179 

8 2041 

13.3790882 

55S6592 

130 i 

10900 

11.4Q17543 1 

7692308 


ISO 

32400 

13.4164070 

6555558 

131 

171 61 

11.4455231 i 

76335SS 


181 

32761 

13.4536240 

5524862 

132 

17424 

11.4891253 

7575758 


182 

3 3124 

13.4907376 

6494505 

133 

1768? 

11.6325620 

7518797 


183 

38489 

13.5277493 

64644S1 

134 

179 56 

11.675S3G9 

74626S7 


184 

33856 

13.5046600 

5434783 

135 

IS225 

11.6189500 

7407407 


185 

84225 

13.6014705 

: 5405405 

136 

18496 

11.6619038 

7352941 


186 

S4S96 

1 13.6381817 

5376344 

137 

187 69 

11.7046999 

7299270 


187 

349 69 

13.6747943 

5347594 

138' 

190 44 

11.7473401 

7246377 


188 

353 44 

13.7113092 

5319149 

139 

103 21 

11.7898261 

7194245 


189 

357 21 

13.7477271 

5291005 

140 

106 00 

11.8321596 

7142857 


190 

36100 

13.7840488 

5263158 

141 

198 81 

11.8743422 

7092109 


191 

36481 

13.S20275Q 

5235602 

142 

20164 

11.0163753 

7042254 


192 

86364 

13.8564065 

5208333 

143 

20449 

11.0582607 

6993007 


103 

87249 

13.8924440 

5181347 

144 

20736 

12.0000000 

6944444 


194 

37636 

13.9283883 

5154639 

145 

21025 

12.0415946 

6S96552 


195 

880 25 

ia.Q6424Qp 

5I2B205 

146 

213 16 

12.08304G0 

6849315 


196 

88416 

14.0QOOOOO 

5102041 

147 

216 09 

12.1243567 

6802721 


197 

3 88 09 : 

14-0358088 

5076142 

148 

21904 

12.1655251 

6756757 


19S 

3 92 04 1 

14.0712473 

5Q50505 

149 

2 22 01 

12.2065556 

6711409 


199 

3 96 01 

14.1067360 

5025126 

150 

2 2500 

12.2474487 

6666667 


200 

400 00 

14.1421356 

5000000 
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APPENDIX Q 


No. 

Squar® ^ 

Square Hcot 

■leciprocal 

.00, 

No. 

Square 

Square Root 

Eeciprocal 

.00 

201 

202 

203 

204 

205 

206 

207 

2m 

^@9 

210 

211 

212 

213 

214 

215 

216 

217 

218 

219 

220 
221 

222 

223 

224 

225 

226 

227 

228 

229 

230 

231 

232 

233 

234 

235 

236 

237 

238 

239 

240 

241 

242 

243 

244 

245 

246 

247 

248 

249 
'250 

4 04 01 

4 08 04 
412 09 

416 16 

4 20 25 

4 24 36 

4 2849 

4 32 64 

4 36 81 

4 4100 

4 45 21 

4 49 44 

4 53 69 
457 96 

4 62 25 

4 66 66 
470 89 
476 24 

479 61 

4 84 00 
48841 i 

492 84 
497 29 
50176 

5 0625 
610 76 
515 29 

619 84 
5 24 41 
629 00 

633 61 
638 24 
64289 

64756 
662 25 
66696 

661 69 
666 44 
671 21 

67600 

68081 

68564 

69049 

69636 

60025 

60516 

61009 

61504 

62001 

1 62600 

14.1774469 

14.2126704 

14.2478068 

14.2828569 

14.3178211 

14.3527001 

14.3874946 

14.4222051 

14.4568323 

14r.4913767 

14,5258390 

14,6602198 

14.5945195 

14.6287388 

14.6628783 

14.6969385 

14.7309199 

14.7648231 

14.798648S 

14.8323970 

14.8660687 

14.8996644 

14.9331845 

14.9666295 

15.0000000 
15.0332964 
15.0665192 , 

15.0996689 

15.1327460 

15.1657609 

15.1986842 

15.2315462 

15.2643376 

15.2970585 

15.3297097 

15.3622915 

15.3948043 

15.4272486 

15.4S9624S 

15.4919334 

15.5241747 

15.5563492 

15.5884573 

15.6204994 

15.6524758 

15.6843871 

16.7162336 

16.7480157 

15.7797338 

15.8113883 

4975124 

4950495 

4928108 

4901961 

4878049 

4854369 

4830918 

4807692 

4784689 

4761905 

4739336 

4716981 

4694836 

4672897 

4651163 

4629630 

4608295 

4587156 

4566210 

454.5455 

4524887 

4504505 

4484305 

4464286 

4444444 

4424779 

4405286 

4385965 

4366812 

4347826 

4329004 

4310345 

429184S 

4273604 

4255319 

4237288 

4219409 

4201681 

4184100 

4166667 

4149378 

4182231 

4116226 

4098361 

4081633 

4065041 

4048583 

4032258 

4016064 

4000000 

251 

252 

253 

254 

255 

256 

257 

258 

259 

260 
261 
262 

263 

264 

265 

266 

267 

268 

269 

270 

271 

272 

273 

274 

275 

276 

277 

278 

279 

280 

4 281 
282 

283 

284 

285 

286 

287 

288 

289 

290 

291 

292 

293 

294 

295 

296 

297 

298 

299 

300 

6 30 01 

6 35 04 

6 40 09 

6 45 IS 

6 50 25 
655 36 

6 60 49 

6 65 64 
670 81 

676 00 
681 21 

6 86 44 

69169 
696 96 

7 0225 

707 56 
712 89 
71824 

723 61 

7 29 00 
73441 

739 84 
745 29 
75076 

75625 
76176 1 
767 29 

77284 

77841 

78400 

78961 

79524 

80089 

806 56 
812 25 
81796 

823 69 
8 29 44 
83521 

84100 
846 81 
85264 

85849 

86436 

87025 

87616 

88209 

88804 

8 9401 
90000 

15.8429795 
15.8745079 
15.9059737 
15.9373775 ' 
15.9687194 
16.0000000 

16.0312195 

16.0623784 

16.0934769 

16.1245155 

16*1554944 

164864H1 

16.2172747 

16.2480768 

16.2788206 

16.3095064 

16.3401346 

16*3707055 

16.4022195 

16.4316767 

16.4620776 

16.4924225 

16,5227116 

16.5529454 

16.5831240 1 
16.6132477 | 
16.6433170 

16.6733320 

16.7032931 

16.7332005 

16.7630546 

16.7928556 

16.8226038 

16.8522995 
16.8819430 
: 16.9115345 

16.9410743 

16.9705627 

17.0000000 

' 17.0293864 
17.05S7221 
17.0880075 

17.1172428 

17.1464282 

17.1755640 

17.2046505 

17.2336879 

17.2626765 

17.2916185 
' 17.3205081 

iiiiiiiiiliiiliifiillllllliiiiilB 
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No, 

Square 

Square Rojpt 

ReciprocilJ 

.00 

351 

1232 01 

18.734994Q 

2S490O3 

352 

1239 04 

18.7616630 

2840909 

353 

1246 09 

18.7882942 

2S32S61 

S54 

125316 

18.814SS77 

2824859 

355, 

12 60 2i> 

18.8414437 

2S1C901 

350 

12 6736 

18.8679623 

2S08989 

357 

127449 

18.8944436 

2801120 

353 

12 8164 

18.9208879 

2793290 

359 

128881 

18.9472953 

278551# 

3G0 

12 96 00 

18.9?36660 

2777778 

361 

13 03 21 

19.0000000 

27700S3 

362 

13 10 44 

19.0262976 

2762431 

363 

1317 69 

19.0525589 

2754821 

364 

13 24 96 

19.0787840 

2747253 

365 

13 32 25 

19.1049732 

2739726 

366 

13 3<J 56 

19.1311265 

2732240 

367 

1346 89 

19.1572441 

2724796 

36S 

13 5424 

19.1833261 , 

2717391 

S69 

13 61 61 

19.2093727 

2710027 

370 

13 69 00 

19.2353841 

2702703 

371 

137641 

19,2613603 

2695418 

372 

13 8384 

19.2873015 

26SS172 

373 

13 91 29 

19,3132079 

26809G5 

374 

13 9876 

19.S390796 

2673797 

375 

140625 

1 19,3649167 

2D66667 

376 

141376 

! 19.3907194 

2659574 

377 

14 21 29 

19.4164878 

2652520 

378 

14 2884 

19.4422221 

2645503 

379 

1436 41 

19.4679223 

2633522 

2S0 

144400 

19.4935887 

2631579 

381 

1 14 51 61 

19.5102213 

2624672 

382 

14 5924 

19.5448203 

2617S01 

383 

14 66 89 

19.5703S5S 

2610906 

384 

14 74 56 

19.5959179 

2604167 

385 

14 82 25 

19.0214169 

2597403 

386 

14 89 96 

19.6408827 

2590674 

387 

14 97 69 

19.6723156 

2583979 

3S8 

15 05 44 

19.0977156 

2577320 

389 

15 1321 

19.7230829 

2570694 

390 

152100 

19.7484177 

2564103 

391 

15 28 81 

19.7737199 

2557545 

392 

15 36 64 

19.7989899 

2551020 

893 

15 4449 

19.8242276 

2544529 

394 

15 52 36 

19.8494332 

263S07I 

395 

15 60 25 

19.8746069 

2531646 

396 

15 68 16 

19.8997487 

2525253 

397 

15 76 09 

19.9248588 

2518892 

398 

15 8404 

19,9499373 

‘ 2512563 

399 

15 92 01 

19.9749844 

2506266 

400 

16 0000 

20.0000000 

2500000 



Square 

Square Roqt 

Reciprocal 

.00 

301 

9 06 0! 

17.3493516 

3322259 

302 

9 12 04 , 

17.3781472 

3311258 

303 

918 09 

17.4068952 

3300330 

304 

9 24 16 

17.4355958 

S2S9474 

305 

930 25 

17.4642492 

327S6S9 

306 

936 36 

17.4928557 

3267974 

307 

9 42 49 

17.5214155 

3257329 

308 

948 64 , 

17.5499288 

3246753 

309 

9 54 81 

17.5783958 

3236246 

310 

9 61 00 

17.6068169 

3225806 

311 

9 67 21 

17.6351921 

3215434 

312 

9 73 44 

17.6635217 

320512S 

813 

979 69 

17.6918060 

319488S 

314 

9 85 96 

17.7200451 

3184713 

315 

9 92 25 

17.7482393 

3174603 

316 

9 98 56 

17.7763888 

3164557 

317 

10 04 89 

17.8044938 

3154574 

318 

10 U 24 

17.8325545 

3144654 

319 

10 17 61 

17.8605711 

3134796 

320 

10 24 00 

17.8885438 

3125000 

321 

10 30 41 

17.9164729 

81J5265 

322 

1036 84 

17.9443584 

3105590 

323 

10 43 29 

17.9722008 

3095975 

324 

1049 76 

18.0000000 

3086420 

325 

SO 56 25 

18.0277564 

8076923 

326 

10 62 76 

18.0554701 

8067485 

327 

10 69 29 

18.0831413 : 

3058104 

828 

1075 84 

18.1107703 

3048780 

329 

10 82 41 

1 18.1383571 

3039514 

330 

10 89 00 

18.1659021 

3030303 

SSl’ 

10 95 61 

18.1934054 

3021148 

332 i 

1102 24 

18.2203672 

S01204S 

333' 

11 08 89 1 

18.2482876 

3003003 

334 

11 15 56 ! 

18.2756669 

2994012 

335 

1122 25 

18.3030052 

29S5075 

336 

1128 96 

18.3303028 

2976190 

337 

1135 69 

18.3575593 

2967359 

338 

1142 44 

18.3847763 

. 29585S0 

339 

1149 21 

18.4119526 

2949853 

340 

115600 

18.4390889 

2941176 

341 

1162 81 

18.4661853 

2932551 

342 

11 69 64 

18.4932420 

2923977 

343 

1176 49 

18.5202592 

2915452 

344 

11 83 36 

18.5472370 

2906977 

345 

1190 25 

18.5741756 

2898551 

346 

1197 16 

18.6010752 

2890173 

347 

12 04 09 

18.6279360 

2SS1844 

348 

121104 

18.6547581 

2873563 

349 

1218 01 

18.6815417 

28G5330 

350 

12 25 00 

18.7082869 

2857143 
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appendix q 


Na 

Square 

Square Boot ^ 

leciprocal 

.00, 

No 

Square 

Square Root ^ 

Ifcirrceal 1 
.00 I 

401 

402 

403 

404 

405 

406 

407 

m 

410 

411 

412 

413 

414 

415 

416 

417 
41S 

419 

420 

421 

422 

423 

424 

425 

426 

427 

42S 

429 

430 

431 

432 

433 

434 

435 
430 

437 

43S 

439 

440 

441 

442 

443 

444 

445 

44C 

441 

44£ 

441 

45( 

S6 OS 01 

16 16 04 

IG 2i 09 

10 32 16 

10 40 25 

16 4S 00 

10 56 49 

16 04 64 

16 72 SI 

10 81 00 

10 S9 21 

10 97 44 

17 05 09 
17 13 96 
17 22 25 

17 30 56 
1733 S9 
17 47 24 

17 55 61 
17 64 00 
17 72 41 

17 80 S4 
17 89 29 

17 97 76 

IS 00 25 ^ 

18 14 76 
1823 29 

1831 84 
1840 41 
1849 00 

1857 01 
18 66 24 
18 74 89 

1883 66 

18 92 25 
1900 90 

19 09 69 
1918 44 
19 27 21 

19 36 00 
19 44 81 
19 S3 04 

19 6249 
19 71 36 
, 19 SO 25 

i 10 8916 
' 19 9806 
5 20 07 04 

) 201601 
5 ^ 20 25 0( 

20.0249844 

20.0499377 

20.0748590 

20.0997512 

20.1246118 

20.1494417 

20.1742410 

20.1990099 

20.22374S4 

20.2484567 

20.2731340 

20.2977831 

20.3224014 

20.3409899 

2C.S71348S 

20.3960781 

20.4205779 

20.4450483 

20.4694893 
20 4939015 
20.5182845 

20.5426386 
20. 6060038 
20. 5912603 

20. 0155281 1 

20.6397674 

20,6039783 

20.CS81609 

20.7123152 

20.7364414 

20.7005395 
20. 7846097 
20.8086520 

20.8326667 

20.8566536 

20.8808130 

20.9045450 

20.9284495 

20-9523268 

20.9761770 
21.0000000 
21. 0237960 

. 21.0475652 
, 21,0713075 
i 21.0950231 
1 21.1187121 
I 21,1423745 
1 21, 1600105 

[ 21.1896201 
) 21.2132034 

24937G0 

2487562 

2481390 

247524S 

2469136 

2463054 

2457002 

2450980 

244498S 

2439024 

2133090 

24271S4 

242130S 

2415459 

2409639 

2403846 

239S0S2 

2392344 

2386035 

23S0952 

2375297 

2369608 

2364068 

2358491 

2352041 

2347418 

2341920 

2336149 

2331002 

23255S1 

2320186 

2314815 

2309469 

2304147 

229S851 

2293578 

228S330 

2283105 

2277904 

2272727 

2207574 

2262443 

2257336 

2252252 

2247191 

2242152 
, 2237136 

i 2232143 

, 2227171 

1 2222222 

451 

452 

453 

454 

455 

456 

457 
45S 

459 

460 

461 
4G2 

463 

464 

465 

466 

467 
46S 

469 

470 

471 

472 

473 

474 

475 

476 

477 

478 

479 

480 

481 
' 482 

483 

484 
4S5 

486 

487 

458 

459 

490 

491 

492 

493 

494 

m 

491 

49? 

49? 

50( 

>0 34 01 ‘ 
20 43 04 

20 52 09 

20 01 16 

20 70 25 

20 79 36 

20 8849 

20 97 04 

21 06 SI 

21 16 00 
21 25 21 
21 34 44 

21 43 09 
21 52 96 
21 62 25 

21 71 50 

21 SO SO 
21 90 24 

21 99 61 

22 09 OO 
22 IS 41 

22 27 84 
2237 29 
22 4676 

22 5625 
220576 
227529 

22 84 84 
2294 41 

23 04 00 

23 13 61 
23 23 24 
23 3289 

23 42 56 
23 5225 
23 01 96 

23 71 69 
23 81 44 

23 91 21 

24 01 00 
24 10 81 

1 24 20 64 

; 24 3049 
: 24 4036 
i 24 50 25 

i 24 60 le 
' 24 70 0£ 
S 24 80 04 

) 24 90 01 
) 25 00 0( 

11.2367606 ‘ 

21.2002916 

21.2837967 

21.3072758 

21.3307290 

21 .3541565 

21 .8775583 
21.4009346 
21.4242R,’i3 

21.4470106 

21.4709106 

21.4941853 

21.5174348 
21.. 5406592 
21.6638587 

21. 5870331 
21.6101828 
21 .0333077 

21.6564078 
21.6794834 
21. 7025344 
21.7255610 
21.7485032 
21.7715411 

21. 7944947 
21.8174242 
21,8403297 

21,8632111 

21.8860686 

21.9089023 

21.9317122 

21.9544984 

21.9772610 

22 OOOOOOO 
22.02271.55 
22.0454077 

22.0680705 

22.0907220 

22.1133444 

22. 1359436 
22.15S5198 
: 22.1810730 

I 22.2036033 
i 22 2261108 
i 22,2485955 

i 22.2710675 
1 22.2034968 
i 22.3159136 

1 22,3383079 

) 22,3006798 

2217295 1 
22123S9 1 
2207506 1 
2202013 1 
2197802 1 
2192982 1 
21SS18-1 
2183406 
•2178640 

217.3913 

2169197 

21C45G2 

215CS27 

21.5.)172 

2150538 

214.5923 

2141328 

2130752 

2132196 

2127660 

2123142 

2118644 

2114165 

2109705 

2105263 

2100S40 

2096436 

2092050 

20S76S3 

20S3333 

2079002 

2O740.S9 

2070393 

2066110 

2001856 

2057013 

20533SS 

2049180 

2044990 

2040816 

2030600 

2032520 

2028398 
: 2024291 

1 2020202 

; 2010129 
i 2012072 
i 2008032 

1 2004008 
! 2000000 


APPENDIX Q 


S<jtiari 1 Squar® Root 


sot 2510 01 

502 25 20 04 

503 25 30 09 

504 25 4016 

605 

606 


23.1732605 

23.1948270 

23.2163735 

23.2379001 

23.2594067 

23.2808935 

23.S023604 

23,3238076 

23.3452361 

23.3666429 

23.3880311 

23.4093998 

23.4307490 

23.4520788 


Reciprocal 

.00 


1996008 

1992032 

1988072 

1984127 

1980198 

1970285 

1972387 

1968504 

1964637 

1960784 

1956947 

1953125 

1949318 

1945525 

1941748 

1937984 

1934236 

1930502 

1926782 

1923077 

1919386 

1915709 

1912046 

1908397 

1904762 

1901141 

1897533 

1893939 

1890359 

1886792. 

1883239 

1879699 

1876173 

1872659 

1S69I59 

1SG5672 

1862197 

1858736 

1855288 

1851852 

1848429 

1845018 

1841621 

1838235 

1834862 

1831502 

1828154 

1824818 

1821494 

1818182 


Squar® 


30 

36 

01 

30 

47 

04 

30 

58 

09 

St) 

69 

*10 

30 

SO 

25 

30 

91 

36 

31 

02 

49 

31 

13 

64 

3i 

24 

81 

31 

36 

00 

31 

47 

21 

31 

58 

44 

31 

69 

69 

31 

SO 

96 

31 

92 

25 

32 

03 

56 

32 

14 

89 

32 

26 

24 

32 

37 

61 

32 

49 

00 

32 

60 

41 

32 

71 

84 

32 

83 

29 

32 

94 

76 

33 

06 

25 

33 

17 

76 

33 

29 

29 

33 

40 

84 

S3 

52 

41 

33 

64 

00 

33 

75 

61 

33 

87 

24 

33 

98 

89 

34 

10 

56 

34 

22 

25 

34 

33 

96 

34 

45 

69 

34 

57 

44 

34 

69 

21 

34 

81 

00 

34 

92 

81 

3504 

64 

35 

16 

49 

35 

28 

36 

35 

40 

25 

35 

52 

16 

35 

64 

09 

35 

76 

04 

35 

38 

01 

35 

00 

OO 



23.4733892 
23 4946802 
23 5159520 

23.5372046 

23.55843S0 

23.5796522 

23.6003474 

23.6220236 

23*6431808 

23.6643191 

23.0S543S6 

23.7065392 

23.7276210 
23 7486842 
23.76972S6 

23.7907545 

23.8117618 

23,8327506 

23.8537209 

23.8746728 

23.8956063 

23.9165215 

23.9374184 

23.9582971 

23.9791576 

24.0000000 

24.0208243 

24.0416306 

24.0624188 

24,0831891 

24.1039416 

24.1246762 

24.1453929 

24.1660919 

24,1867732 

24.2074369 

24.2280829 

24.24S7113 

24.2693222 

24.2899156 

24.3104916 

24,3310501 

24.3515913 

24.3721152 

24.3926218 

24.4131112 

24.4335834 

24.4540385 

24.4744765 

24.4918974 


Reciprocal 

.00 


1814882 

1811594 

1808318 

1805054 

1801802 

1798561 

^7953k^ 

1792115 

17SS909 

1785714 

1782531 

1779359 

1776109 

1773050 

1769912 

1766784 

1763668 

1760563 

1757469 

1754386 

1751313 

1748252 

174.5201 

1742160 

1739130 

1736111 

1733102 

1730104 

1727116 

1724138 

1721170 

1718213 

1715266 

1712329 

1709402 

1700485 

1703578 

1700680 

1697793 

1694916 

1692047 

1689189 

1686341 

16S3502 

1680672 

1677852 

1675042 

1672241 

1600440 

1666667 
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APPENDIX 0 


ft 


No, 

Satsaye 

SQuare Root 

Reciprocal 

.00 


No. 

Square 

Square Root 

Reciprocal 

.00 

601 

36 12 Ol' 

24.5153013 

1663894 


651 

42 38 01 

25.5147016 

1536098 

602 

30 24 04 

24.5356883 

1C61130 


652 

42 51 04 

25.5342907 

1533742 

603 

36 36 09 

24.5560583 

1658375 


653 

42 04 09 

25.6538647 

1531394 

604 

36 48 16 

24.5764115 

1655629 


654 

42 77 16 

25.5734237 

1529052 

606 

36 60 25 

24.5967478 

1652893 


655 

42 90 25 

25.5929678 

1526718 

606 

36 72 36 

24.6170673 

1650165 


656 

43 03 sa 

25.61249G9 

1524390 

mt 

36 84 49 

24.6373700 

1647446 


657 

43 16 49 

25.6320112 

1522070 

60S 

36 96 64 

24.6570560 

1644737 


658 

43 29 64 

25.6515107 

1519757 

609 

37 08 81 

24.D779254 

1642036 


659 

43 42 Si 

25-6709953 

1517451 

610 

3721 00 

24.6981781 

1639344 


660 

43 56 00 

25-6904652 

1515152 

611 

37 33 21 

24.7184142 

1636661 


661 

43 69 21 

25.7099203 

1512859 

612 

37 45 44 

24.7386338 

1633987 


662 

43 82 44 

25.7293607 

1510574 

613 

37 67 69 

24.75S336S 

1631321 


663 

43 95 69 

25.74S7SG4 

1508296 

614 

37 69 96 

24.7790234 

1628G64 


664 

44 08 96 

25.76S1975 

1506024 

615 

37 82 25 

24.7901935 

1626016 


665 

44 22 25 

25.7875939 

1503759 

616 

37 94 56 

24.8193473 

1623377 


866 

44 35 56 

25.8069758 

1501502 

617 

38 06 89 

24.8394847 

1620746 


667 

44 48 89 

25.8263431 

1499250 

618 

38 19 24 

24.8596058 

1618123 


668 

44 62 24 

25.8456960 

1497006 

619 

38 31 61 

24.8797106 

1615509 


669 

44 75 61 

25.8650343 

149476S 

620 

38 44 00 

24.8997992 

1612903 


670 

44 89 00 

25.8843582 

1492537 

621 

38 56 41 

24.9198716 

1610306 


671 

45 02 41 

25.9036677 

1490313 

622 

38 68 84 

24.9399278 

1607717 


672 

45 15 84 

25.922962S 

148S095 

623 

38 81 20 

24.9599679 

1605136 


673 

45 29 29 

25.9422435 

14B5SS4 

624 

38 93 76 

24-9799920 

1602584 


674 

45 42 76 

25.9615100 

14S36SO 

625 

39 06 25 

25.0000000 

1600000 


675 

45 56 25 

25.9807621 

1481481 

626 

39 18 76 

25.0199920 

1697444 


676 

45 69 76 

26.0000000 

1479290 

627 

39 31 29 

25.0399681 

1694896 


677 

45 83 29 

26.0192237 

I 1477105 

628 

39 43 84 

25.0599282 

1592357 


678 

45 96 84 

26.0384331 

1474926 

629 

39 56 41 

25.0798724 

1589825 


679 

46 10 41 

26.0676284 

1472754 

630 

39 69 00 

25.0998008 

1587302 


680 1 

46 24 00 

26.0768096 

1470588 

631 

39 81 61 

25.1197134 

1584786 


681 1 

46 37 61 

26.0959767 

1468429 

632 

39 94 24 

25.1396102 

1582278 


682 

46 51 24 

26.1151297 

1466270 

633 

40 0689 

25.1594913 

1679779 


683 ; 

46 64 89 

26. 1342687 

1464129 

634 

40 19 66 

25.1793566 

1677287 


684 

46 78 56 

26.1533937 

14619S8 

635 

40 32 25 

25.1992063 

1574803 


685 

46 92 25 

26.1725047 

1459854 

636 

40 44 96 

25.2190404 

1572327 


686 

47 05 96 

26.1916017 : 

1457726 

637 

40 57 69 

25.2388589 

1569859 


687 

47 19 69 

26.2106S4S 

1455604 

638 

40 70 44 

25.2586619 

1567398 


68S 

47 33 44 

26.2297541 

14534SS 

639 

40 83 21 

25.2784493 

1564945 

j 

689 

47 47 21 

26.2488095 

1451379 

640 

40 96 00 

25.2982213 

1562500 


690 

47 61 00 

26.2678511 

1449275 

641 

41 08 81 

25,3179778 

1560062 


691 

47 74 81 

26.2868789 

1447178 

642 

41 21 64 

25a3377189 

1557632 


692 

47 88 64 

26.3058929 

1445087 

643 

41 3449 

25.3574447 

1555210 


693 

48 02 49 

26.3248932 

1443001 

644 

41 47 36 

25.3771551 

1552795 


694 

48 16 36 

26 3438797 

1440922 

645 

41 60 25 

25.396S502 

1550388 


695 

48 30 25 

26.3628527 

143SS49 

616 

41 73 IG 

25.4165301 

1547988 


696 

48 44 16 

26.3818119 

I4367S2 

647 

41 86 09 

25.4361947 

1546595 


697 

48 58 09 

26,4007570 

1434720 

848 

41 99 04 i 

25.4558441 

1543210 


698 

48 72 04 

26.4196896 

1432666 

649 

42 12 01 

25.4754784 

1540832 


699 

48 86 01 

26.4386081 

1430615 

660 

42 25 00 

25.4950976 

1538462 


700 

49 00 00 

26.4676131 

1428571 




APPENDIX Q 


773 


No. 

Squa?a 

Square Root 

Reciprocal 


No. 

Square 

Square Root 

Bl^SSSII 

701 

49 14 01 

20.4764046 

1426534 


751 

56 40 01 

^ 

27.4043792 

1331558 

702 

49 28 0 1 

26. 4052826 

1424501 


752 

50 55 04 

27.4226184 

1329787 

703 

49 42 OU 

20 0141472 

1422475 


753 

56 70 09 

27.4408455 

1328021 

704 

49 56 16 

20.5329083 

1420455 ^ 


754 

56 85 16 

27.4590604 

1326260 

705 

49 70 25 

20.5518361 

1418440 


755 


27.4772633 

1324503 

70G 

49 84 30 

2G.5706G05 

1416431 


75G 

.5710 36 

27.4954542 

1322751 

707 

49 9S 49 

2G.5S9471G 

1414427 


757 

57 30 49 

27.6136330 

1321004 

70S 

50 12 64 

26.60S269S 

1412429 


758 

57 45 64 

27.6317998 

1319261 

709 

50 26 81 

2G. 6270539 

1410437 


759 

57 60 81 

27.5499546 

131X|23 

710 

60 41 00 

26.6458252 

1408451 


760 


27.6680975 

1315789 

711 

60 65 21 

26,6045833 

1-10C47O 


761 

57 91 21 

S7.58G2284 

1314060 

712 

50 69 44 

2G.6S332S1 

1404494 


w&m 

58 0844 

27.6043475 

1312336 

713 

60 83 69 

26.7020598 

1402525 


763 

58 21 69 

27.6224546 

1310616 

714 

50 97 96 

26.7207784 

1400500 


764 

58 36 96 



715 

51 12 25 

26-7394839 

1398601 


765 




716 

51 2656 

26.7581763 

1396648 


766 




717 

51 40 89 

2G.776S557 

1394700 


767 

58 82 89 

27.6947648 


718 

51 55 24 

26.7055220 

1392758 


768 

58 98 24 

27.7128129 


719 

51 69 61 

20.8141754 

1390821 


769 

59 13 61 

27.7308492 


720 

51 S4 00 

26.8328157 

13SSS89 


rati 


27.7488739 


721 

51 98 41 

26.8514432 

1386963 


771 

69 4*1 41 



722 

52 12 $4 

26.8700577 

13S5042 


772 

69 59 84 


1295337 1 

723 

52 27 29 

26.88S6593 

1SS3126 


773 

59 7529 

27.8028775 


724 

52 41 70 

26.9072481 

1381215 


774 

59 90 76 


1291990 1 

725 

52 56 25 

26.9258240 

1379310 


775 

60 06 25 



726 

52 70 76 

26.9443872 

1377410 


776 

60 2176 

27.8567766 


727 

52 85 29 

26.9629375 

1375516 


777 

60 37 29 



728 

52 99 84 

26.9814751 

1373626 


778 

60 52 84 



729 

53 14 41 

27.0000000 

1371742 


779 

60 68 41 

27.9105715 

i 1283697 1 

730 

53 29 00 

27.0185122 

1360863 


780 




731 

53 43 61 

27.0370117 

1367989 


781 

00 99 61 

1 27.9463772 

1280410 

732 

53 58 24 

27.0554985 

1366120 

1 

783 

61 15 24 


1278772 

733 

53 72 89 

27.0739727 

1364256 

i 

1 

783 


37.9821372 

1277139 

734 

53 87 56 

27.0924344 

1362398 

1 

784 

61 46 56 

28.0000000 

1275510 

735 

54 02 25 

27.110SS34 

1360544 


785 

61 02 25 

28.0178515 


736 

54 16 96 

27.1293199 

1358696 


786 

Cl 77 96 

28.0356915 


737 

54 31 69 

27.1477439 

1356852 


787 

61 93 09 

28.0535203 

1270648 1 

738 

5446 44 

27.1661554 

1355014 


788 

62 09 44 

28.0713377 

mmsm 

739 

54 61 21 

27.1845544 

1353180 


789 

62 25 21 

28.0891438 

1267427 1 

740 

54' 76 00 

27.2029410 

1351351 


790 


28. 1069386 


741 

54 90 81 

27.2213152 

1349528 


791 

62 56 81 

28.1247222 

1264223 

742 

55 05 64 

27.2396769 

1347709 


792 

62 72 64 

2Sr 1424946 

1262626 

743 

55 20 49 

27.2580263 

1345895 


793 

62 88 49 

28.1602557 

1261034 

744 

5535 36 

27.2763634 

1344086 


794 

63 04 36 


1259446 

745' 

55 50 25 

27.2946881 

1342282 


795 

63 20 25 

28.1957444 

1257862 

746 

55 6516 

27.3130006 

1340483 


796 

63 3616 

28.2134720 

1 ^^ 

747 

55 SO 09 

27.3313007 

1338688 


797 

63 52 09 

28.2311884 

1254705 

748 

55 95 04 

27:3495887 

1336898 


798 


28.2488938 

1253133 

749 

66 10 01 

27.3678644 

1335113 


799 

63 84 01 

28.2665881 

1251564 

750 

56 25 00 

27-3861279 

1333333 


800 


28.2842712 

1250000 








774 


APPENDIX Q 


No. 

Battsx© 

Square Eoot 

Reciprocal 

.00 


B 

Square 

Square Root 

Reciprocal 

OD 

801 

64 16 01 

28; 3019434 

1248439 


851 

72 42 01 

29.1719043 

1176088 

S02 

64 32 04, 

28.3190045 

1246883 


852 

72 5004 

29.1890390 

1173709 

803 

04 48 09 

28.3372546 

1245330 


853 

72 7609 

29.2061637 

1172333 

804 

64 64 16 

28.3548938 

J2437S1 


'854 

72 93 16 

29.2232784 

1170960 

805 

04 80 25 

28.3725219 

1242236 


855 

73 10 25 

29 2403S30 

1169591 

806 

64 95 36 , 

28.39013911 

1240695 


856 



1168224 

807 

651249 

2S.4077454 

1239157 


857 

73 44 49 



808 

65 28 64 

28.4253408 

1237624 


858 

73 61 64 




654481 : 

28.4429253 

1236094 


859, 

73 78 81 

29.3087018 


810' 

65 61 00 

28.4604989 

1234,568 


860 

73 96 00 

29.3257506 


811 

6577 21 ; 

28.4’r80617 

1233046 


861 

7413 21 


1161440 

812 

65 9344: 

28.4956137 

1231527 


862 

74 30 44 



813 

6609 69 

28.5131549 

12SO012 


803 

74 4769 

29.3768616 

1158749 

814 

0625 96 

28.5306852 

1228501 


864 



H 57407 

815 

60 42 25 

28.6482048 

1226994 


865 

74 8225 

29.4108823 

1156069 

816 

66 58 56 

28.5657137 

1225490 


866 

74 99 56 

29.4278779 

1154734 

817 

66 74 89 

28.5832119 

1223990 


867 

75 16 89 

29.4448637 

1153403 

818 

66 9124 

28.6006993 

1222494 


868 

75 34 24 

29.4618397 

1152074 

819 

6707 61 

28.6181760 

1221001 


869 


29.4788059 

1150748 

820 

67 24 00 

28.6356421 

1219512 


870 


29.4957624 

1149425 

821 

674041 

28.6530976 

1218027 


871 

75 86 41 

29.5127091 

1148106 

823 

67 5684 

28.6705424 

1216545 


872 


29.5296461 

1146789 

823 

67 73 29 

28.6879766 

1215067 


873 

76 21 29 

29.5465734 

1145475 

S24 

67 89 76 

28.7054002 

1213592 


874 

7638 76 

29.6634910 

1144165 

825 I 

68 06 25 

28*7228132 

1212121 


875 

76 5625 

29.5803989 

1142857 

826 i 

08 22 76 

28.7402157 

1210664 


876 

76 73 76 

29.5972972 

1141553 

827 1 

683929 

: 28.7576077 

1209190 


877 

7691 29 

29.6141858 

1140251 

828 : 

6$ 55 $4 

^ 28.7749891 

1207729 


878 

77 OS 84 

29.6310648 

1138952 

829 : 

CB 72 41 

28.7923001 

1206273 


879 

77 26 41 

29.6479342 

1137656 

S30j 

68 89 00 

1 28.8097206 

1204819 


880 

774100 

29.6647939 

1136304 

831 ' 

60 05 61 

' 28.8270706 

1203369 


881 

77 61 61 

29.6816442 

1135074 

832 

69 22 24 

* 28.8444102 

1201923 


8S2 

77 79 24 

29.6984848 

1133787 

833 , 

69 38 89 

2S. 8617394 

1200480 


883 

77 96 S9 

29.7153159 


834! 

69 55 56 

28.8790582 

1199041 


884 

78 14 56 

29.7321375 

1131222 ■ 

835 

09 72 25 

28.8903066 

1197605 


885 j 

78 32 25 

29.7489496 

1129944 

836; 

09 88 96 

28.9136646 

1196172 


886 j 

78 40 96 

29.7657521 

U28668 

837 1 

70 05 69 

28.9309523 

1194743 


ssrl 

78 67 69 

29.7825452 

1127396 

838 ; 

70 22 44 

28.9482297 

1193317 


888; 

78 85 44 

29.7993289 

1126126 

839 

70 39 21 , 

1 28.9654967 

1191895 


889 1 


29.8161030 

1124S59 

840 ! 

70 5600 

28.9827535 

1190476 


890 

79 2! 00 

29.8328678 

1123596 

841 1 

70 72 81 

29.0000000 ; 

1189061 


891 ' 

79 38 81 

29.8496231 

1122334 

842 j 

70 89 64 

29,0172363 

1187648 


892^ 


29.8663690 

1121076 

843 1 

71 06 49 

29.0344623 1 

11SG240 


893 ^ 

79 74 49 

29.8831056 

1119821 

844 ^ 

71 23 36 

29.0510781 

1184834 


894 

79 92 36 

29.8908328 

1118568 

845 

71 40 25 

29.0688837 ; 

X183432 


805 

80 10 25 

29.9165506 

1X17318 

846' 

71 S7 16 

29.0860791 

1182033 


896 


29.9332591 


847 

71 74 09 

29.1032644 

1180638 


897 


29,9499583 

1114827 

848 J 

71 91 04 

29.1204396 

117924^ 


898 

SO 64 04 

29.0660481 : 

1U3S86 

849 i 

72 08 01 

29.1376046 

1177856 


899 

80 82 01 

29.98332S7 

1112347 

850 i 

72 25 00 

29. 1547595 

1176471 


900 


30.0000000 

1111111 










775 


APPENDIX 0 


No. 

SqtiJiro 

Square Koot 

Reciprocal 

1 

Ho. 

Square | Srsu'U'C |lr«t 

E'.iciprffcal 

.00 . 

901 

81 18 Oi 

S0.01C0620 

1109878 


951 

SO 44 01 

::0.S332S79 

1651525 

902 

s] 30 04 

30.0333148 

llOSGl? 


9h3 

00(33 0-i 

30.85! i€/2 

1050421] 

903 

81 54 09 

30.0490584 

1107420 


953 

00 82 09 

30 . BTOuGSI 


904 

81 72 16 

S0.0C65O28 

110G105 


954 

01 01 16 

30.8568004 

10I82I8 

905 

81 90 25 

30.0832179 

1104972 


955 

91 2C*25 

30. €030713 

i0t7i20 

906 

82 08 36 

30,0998330 

1103753 


956 

91 39 36 

30.9102107 

I04C025 

907 

82 26 49 

30,1164407 

1102530 


957 

91 58 49 

St). 03541 GG 

104493 

ous 

82.!4C'l 

30.1330383 

1101322 


O^S 

0177 61 

30. ! *5 1575 1 

lOiSSfl 

€09 

82 02 81 

30. 1490209 

nooiio 


95v> 

01 90 SI 

3d. 5677:51 

1U127;?. 

910 

82 81 OD 

30.1GG20G3 

1C0S9O1 


960 

92 IG CO 

3C»0S38t;C.S 

1011667 

911 

82 99 21 

30.1897705 

1007G95 


001 

92 35 21 

31.0000000 

1010583 

912 

S3 17 44 

»S0 . 1 9933 i / 

, XG96191 


86:i 

02 54 44 

3l.ClGi24S 

1039501 

013 

S3 35 G9 

30,2158899 

1005200 


903 

92 73 09 

31.0322113 

3038-122 

914 

83 53 90 

30.232.(329 

10940!I2 


961 

02 92 9G 

oi.oisawi 

1037344 

916 

S3 72 25 

30.2489069 

1092890 


065 

931225. 

31.0614491 

li)3G2G0 

OIG 

83 00 56 

30.2654910 

1001703 


OGG 

933! 56' 

31.080.7105 

1035197 

917 

84 03 SO 

30.2820079 

1090513 


967 

93 50 89 

31,0906230 

103412G 

018 

84 27 24 

30.2985148 

1039325 


068 

93 70 24 

SI.I1269S4 

1033058 

919 

84 45 G1 

30.3150128 

1088139 


909 

93 89 G1 

S1.12S7G48 

1031992 

920 

84 64 00 

30.3315018 

1080957 


970 

94 09 00 

ai.HlS230 

1030928 

921 

84 82 41 

30.3479818 

1085776 


071 

94 2S41 

31.1608729 

1029S66 

922 

S5 00 84 

30.3644529 

10S4599 


072 

94 47 8i 

31.37601-55 

102S807 

923 

85 1029 

S0.3S09151 

1083424 


073 

94 67 29 

31.11(29473 

1027749 

024 

85 37 76 

30.3973683 

10S2251 


074 

94 SO 76 

31.2089731 

mmu 

926 

85 56 25 

30.4138127 1 

10810S1 


075 i 

9506 25 

31.2210900 

1025041 

926 

85 74 76 

30,4302481 ! 

1079914 


076 1 

95 2576 

Si.2iC0D87 

1024590 

927 

85 03 20 

30.4406747 

107S749 


077 

9545 29 

31 .2509992 

1023541 

028 

S6 11 84 

30.4630924 

10775SG 


OTS 

95G181 

31.2729015 

1022495 

929 

86 30 41 

30.4795013 

1076426 


979 

95 84 41 

3 1 . 2S897 57 

1021450 

930 

86 49 00 

30.4959014 

1075269 


080 

960100 

31 .3049517 

1020408 

931 

86 67 01 

30.5122026 

1074114^ 


9S1 

96 23 01 

31.3209195 

i 10193G8’ 

932 i 

86 86 24 

30.5286750 

1072901 


0S2 

9G43 24 

31.33OS702 

' 2018300 

933 

87 04 89 

30.5450487 

1071811 


es3 

9G 02 SO 

31.3528303 

; 1017294 

m 

87 23 56 

30.5614136 

1070664 


9S4 

90 S2t5G 

31.3087743 

1016260 

935 

S7 42 25 

30.5777697 

10G9519 


9S5 

97 02 25 

31.3S47097 

101522S 

936 

87 GO 96 

30.5941171 

106S37G 


OSG 

07 21 96 

31.4006399 

1014199 

937 

87 79 69 

30.6104557 

10G7236 


987 

97 41 G9 

31.4165501 

1013171 

938 

87 9844 

i 30.6267857 

106G09S 


OSS 

97 G1 44 

31.4324673 

1012146 

930 

88 17 21 

1 30.6431069 

1064063 


089 

97 SI 21 

31.44S3704 

1011122 

940 

SS 3G 00 

' 30.6591194 

106SS30 


990 

98 01 00 

SI. 46 12054 

lOlOIOl 

941 

88 54 81 

i 30.6757233 

1062699 


991 

9S20S1 

31, 4801525 

1009082 

942 

88 73 64 

30.6920185 

1061571 


992 

984064 

31.4960315 

1008065 

943 

88 92 49 

30.7083051 

1060445 


093 

98 6049 

31.5119025 

1007049 

944 

89 1136 

30.7245830 

1050322 


094 

98 8036 

3L52776rk5 

1006036 

945 

89 30 25 

30.7408523 

1058201 


995 

99 00 25 

31.5436200 

1UO5025 

946 

89 49 16 

30.7571130 

10570S2 


906 

99 201G 

31. 5594077 

1004016 

917 

89 68 09 

30.7733651 

1055966 


907 

99 40 09 

31 . 5753008 

1003009 

948 

ISO 87 04 

30.7896086 

1054852 


908 

1 99 GO 04 

31.5911380 : 

1002004 

949 

;90 06 01 

30.3058436 

1053741 


999 

99 80 01 

31,6000013 ! 

1001001 

950 

i 90 25 00 

30.8220700 

1052632 


1000 

1 00 00 00 

31.G227766 ; 

iOOOOOD 




APPENDIX R 


Conjmon Logarithms of Numbers 


The common logarithm of a number (N in the table) is the power to 
which 10 must be raised to produce N. The adjective “common” indi- 
cate§4;hat a logarithm is to the base 10 rather than to some other base — 
for example, e = 2^71828, the base of “natural” logarithms. When the 
unmodified term “logarithm” is used, it is generally understood that 
common logarithms are meant. A logarithm is composed of two parts, 
the characteristic and the mantissa. 

The characteristic, which is always an integer or zero, is deternained 
by the following rule: 


If iV S 1, the characteristic is positive and its value is one less than the 
number of digits in N which are to the left of the decimal point. For 
example, 

N Characteristic 


4568 3 

456.8 2 

45.68 1 

4.568 0 


If iV < 1, the characteristic is negative and its value is one more than the 
number of zeros just to the right of the decimal point. For example, 

N Characteristic 


0.4568 -lor9-10 

0.04568 -2 or 8 ~ 10 

0.004568 ~3or7-10 

0.0004568 ~4or6 ~ 10 


The mantissa, which is always a decimal or zero, is obtained from a 
table such as that which follows. The mantissa is the same for any given 
combination of digits no matter where the decimal point may be placed. 
Thus, for all of the eight Ws Just listed, the mantissa is 0.659726. 

Combining the characteristic and the mantissa gives the logarithm. 
For the eight values of N given above, 


N 

4568 

456.8 

45,68 

4.56S 

0.4568 

0.04568 

0.004568 

0.0004568 


Logarithm 

3.659726 

2.659726 

1.659726 
0.659726 

9.659726 -- 10 

8.659726 - 10 

7.659726 10 

6.659726 - 10 
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N. 

0 

1 

2 

3 

4 

, 5 

e 

,7 

S 


B. 

100 

000000 

000434 

000868 

001301 

001734 

002166 

002598 

003029 

003461 

003891 

432 

1 

432! 

475! 

5181 

5609 

6038 

6466 

6894 

7321 

7748 

8174 

428 

2 

8600 

9026 

9451 

9876 

010300 

010724 

011147 

011570 

011993 

012415 

424 

3 

012837 

013259 

013680 

014100 

4521 

4940 

5360 

5779 

6197 

6616 

429 

4 

7033 

7451 

7868 

8284 

8700 

9116 

9532 

9947 

020361 

020775 

416 

105 

021189 

021603 

022016 

022428 

022841 

023252 

023664 

024075 

4486 

4896 

412 

6 

5306 

5715 

6125 

6533 

6942 

7350 

7757 

8164 

8571 

8978 

408 

7 

9384 

9789 

030195 

030600 

031004 

031408 

031812 

032216 

032619 

033021 

404 

8 

033424 

033826 

4227 

4628 

5029 

5430 

5830 

6230 

6629 

7028 

400 

9 

7426 

7825 

8223 

8620 

9017 

9414 

9811 

040207 

040602 

04cy;|8 

397 

110 

041393 

041787 

042182 

042576 

042969 

043362 

043755 

044148 

044540 

044932 

393 

1 

5323 

5714 

6105 

6495 

6885 

7275 

7664 

8053 

8442 

8830 

390 

2 

9218 

9606 

9993 

050380 

050766 

051153 

051538 

051924 

052309 

052694 

386 

3 

053078 

053463 

053846 

4230 

4613 

4993 

5378 

5760 

6142 

6524 

383 

4 

6905 

7286 

7666 

8046 

8426 

8805 

9185 

9563 

9942 

060320 

379 

115 

060698 

061075 

061452 

061829 

062206 

062582 

062958 

063333 

063709 

4083 

376 

6 

4458 

4832 

5206 

5580 

5953 

6326 

€699 

7071 

7443 

7815 

373 

7 

8186 

8557 

8928 

9298 

9668 

070038 

070407 

070776 

071145 

071514 

370 

8 

071882 

072250 

072617 

072985 

073352 

3718 

4085 

4451 

4816 

5182 

368 

9 

5547 

5912 

6276 

6640 

7004 

7368 

7731 

8094 

8457 

8819 

363 

120 

079181 

079543 

079904 

080266 

080626 

080987 

081347 

081707 

082067 

082425 

360 

1 

082785 

083144 

083503 

3861 

4219 

4576 

4934 

5291 

5647 

6004 

357 

2 

€360 

6716 

7071 

7426 

778! 

8136 

8490 

8845 

9198 

9552 

355 

3 

9905 

090258 

090611 

090963 

091315 

091667 

092018 

092370 

092721 

093071 

352 

4 

093422 

3772 

4122 

4471 

4820 

5169 

5518 

5866 

6215 

6562 

349 

125 

€910 

7257 

7604 

7951 

8298 

8644 

8990 

9335 

968! 

100026 

346 

6 

100371 

100715 

101059 

101403 

101747 

102091 

102434 

102777 

103119 

3462 

343 

7 

3804 

4146 

4487 

4828 

5169 

5510 

5851 

619! 

6531 

6871 

341 

8 

7210 

7549 

7888 

8227 

8565 

8903 

9241 

9579 

9916 

110253 

338 

9 

110590 

110926 

111263 

111599 

111934 

112270 

112605 

112940 

113275 

3609 

335 

130 

113943 

114277 1 

114611 

114944 i 

115278 

I1S611 

115943 

116276 

1 16608 

116940 

333 

1 

7271 

7803 

7934 

8265 

8595 

8926 

9256 

9586 

9915 

120245 

330 

2 

120574 

120903 

121231 

12IS60 

121S88 

122216 

122544 

122871 

123198 

3525 

328 

3 

3852 

4178 

4504 

4830 

5156 i 

5481 

5806 

6131 

6456 

6781 

325 

4 

7105 

7429 

7753 

8076 

8399 

8722 

9045 

9368 

9690 

130012 

323 

13S 

130334 

130655 

130977 

131298 

431619 

131939 

132260 

132580 

132900 

3219 

321 

6 

1 3539 

3858 

4177 

4496 

4814 

5133 

5451 

5769 

6086 

6403 

318 

7 

- 6721 

7037 

7354 

7671 

7987 

8303 

8618 

8934 

9249 

9564 

316 

8 

; 9879 

140194 

140508 

140822 

141136 

141450 

141763 

142076 

142389 

142702 

314 

9 

1 143015 

3327 

3639 

3951 

4263 

4574 

4835 1 

5196 

5507 

5818 

311 

140 

146128 

146438 

1 146748 

1 147058 

147367 

147676 

147985 

148294 

148603 

t489n 1 

309 

1 

9219 

9527 

: 9835 

1 150142 

150449 

150756 

151063 

151370 

151676 

151982 

307 

2 

1 152288 

152594 

152900 

! 3205 

3510 

3815 

4120 

4424 

4728 

5032 

305 

3 

I 5336 

5640 

5943 

! €246 

6549 

6852 

7154 

7457 

7759 

8061 

303 

4 

: 8362 

8664 

8965 

1 9266 

9567 

9868 

160168 

160469 

160769 

161068 

301 

145 

! 161368 

i 161667 

161967 

162266 

162564 

‘ 162863 

3161 

3460 

3758 

4055 

^ 299 

6 

1 4353 

! 4650 

4947 

5244 

5541 

5838 

6134 

6430 

6726 

7022 

297 

7 

1 7317 

7613 

7908 

8203 

8497 

8792 

9086 

9380 

9674 

9968 

295 

8 

1 170262 

i 170555 

170848 

171141 

171434 

171726 

172019 

172311 

172603 

172895 

293 

9 

3186 

3478 

3769 

4060 

435! 

4641 

4932 

5222 

5512 

5802 

291 

150 

176091 

176381 

176670 

176959 

177248 

177536 

177825 

178113 

178401 

178689 

289 

1 

8977 

9264 

9552 

9839 

180126 

180413 

180699 

180986 

181272 

181558 

287 

2 

181844 

182129 

182415 

182700 

2985 

3270 

3555 

3839 

4123 

4407 

285 

3 

4691 

4975 

5259 

5542 

5825 

6108 

€391 

6674 

6956 

7239 

283 

4 

7521 

7803 

8084 

836$ 

8647 

8928 

9209 

9490 

9771 

190051 

281 

1S5 

190332 

190612 

190892 

191171 

191451 

191730 

192010 

192289 

192567 

2846 

279 

6 

3125 

3403 

3681 

3959 

4237 

4514 

4792 

5069 

5346 

5623 

278 

7 

5900 

6176 

6453 

6729 

7005 

7281 

7556 

7832 

8107 

8382 

276 

8 

8657 

8932 

9206 

9481 

9755 

200029 

200303 

200577 

2O08SO 

201124 

274 

9 

201397 

201670 

201943 

202216 

202488 

276! 

3033 

3305 

3577 

3848 

272 

K. 

0 

1 

2 

Z 

4 

6 

S 

7 

E 

9 

B. 


Log c = 0.434296; log v = 0.497160; log = 0.248575. 
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r 0 

1 

#■ 

,2 

3 

4 

5 

& 

7 

8 

3 


16a 

204120 

204391 

204663 

204934 

205204 

205475 

205746 

206016 

206286 

206550 

271 

1 

6826 

7096 

7365 

7634 

7904 

8173 

844! 

8710 

8979 

9247 

269 

2 

9515 

9783 

210051 

210319 

210586 

210853 

211121 

211388 

211654 

21192! 

267 

3 

212188 

21 2454 

2720 

2986 

3252 

3518 

3783 

4049 

4314 

4579 

266 

4 

4844 

5109 

5373 

5638 

5902 

6165 

6430 

6694 

0957 

7221 

264 

IBS 

7484 

7747 

8010 

8273^ 

8536 

8798 

9060 

9323 

9585 

9846 

262 

6 

220108 

220370 

22063! 

220892 

221153 

2214^4 

221675 

221936 

222196 

222456 

261 

7 

2716 

2976 

3236 

3496 

3755 

4015 

4274 

4533 

4792 

505! 

259 

8 

5309 

5568 

5826 

6084 

6342 

6600 

6858 

7115 

7372 

7639 

258 

s.. 

1 7887 

8144 

8400 

3657 

8913 

9170 

9426 

9682 

9938 

230193 

2SS 

170 

230449 

230704 

230960 

231215 

231470 

231724 

231979 

232234 

232488 

232742 

255 

1 

2996 

3250 

^b04 

3757 

4011 

4264 

4517 

4770 

5023 

5276 

253 

2 

5528 

5781 

6033 

6285 

6537 

6789 

7041 

7292 

7544 

7795 

252 

3 

8046 

B2£7 

8543 

8799 

9049 

9299 

9550 

9800 

24005Q 

240300 

250 

4 

240549 

2407G3 

241043 

241297 

241546 

241795 

242044 

242293 

2541 

2790 

249 

17S 

3038 

3286 

3534 

3782 

4030 

4277 

4525 

4772 

5019 

S266 

248 

6 

5513 

5759 

6006 

6252 

6499 

6745 

6991 

7237 

7482 

7728 

246 

1 

7973 

8219 

8464 

8709 

8954 

919S 

9443 

9687 

9932 

250176 

245 

S 

250420 

250664 

250908 

251151 

251395 

251838 

251881 

252125 

252368 

2610 

243 

S 

2853 

3036 

3338 

3580 

3822 

4064 

4306 

4548 

4790 

5031 

242 

180 

255273 

255514 

255755 

255996 

256237 

256477 

256718 

256958 

257198 

257439 

241 

1 

7679 

7313 

8158 

8398 

8637 

8877 

9116 

9355 

9594 

9833 

239 

2 

260071 

260310 

260548 

260787 

261025 

261283 

261501 

261739 

261976 

262214 

238 

3 

2451 

2688 

2925 

3162 

3339 

3636 

3873 

4109 

4346 

4582 

237 

4 

4818 

5034 

=290 

5525 

5761 

5996 

6232 

6467 

6702 

6937 

235 

185 

7172 

7405 

7641 

7875 

8110 

8344 

8578 

8812 

9046 

9279 

234 

6 

9513 

9746 

9980 

270213 

270446 

270579 

270912 

271144 

271377 

271609 

233 

7 

271942 

272074 

272306 

2538 

2770 

3001 

3233 

3464 

3698 

3927 

232 

8 

4153 

4389 

4620 

4850 

5081 

5311 

5542 

5772 

6002 

6232 

230 

9 

6462 

6692 

692! 

7151 

7380 

7609 

7838 

8067 

8295 

8525 

229 

190 

278754 

278982 * 

27921! 

279439 

279667 ' 

279895 

280123 

280351 

280578 

280806 

228 

I 

281033 

281261 i 

281488 

281715 

281942 

282169 

2396 

2622 

2849 

3075 

227 

2 

3301 

3527 

3753 

3979 

4205 

4431 i 

4656 

4082 

5107 

5332 

226 

3 

5557 

5782 i 

6007 

6232 

6456 1 

668! 1 

6905 

7130 i 

7354 

7578 

225 

4 ^ 

7802 

8026 : 

8249 

8473 

8696 

8920 1 

9143 

9366 

9589 

9812 

223 

195 

290035 

290257 

290480 

290702 

290925 

291147 1 

291369 

291591 i 

291813 

292034 

222 

6 

2256 

2478 1 

2699 

2920 

3141 i 

J363 

3584 

3804 i 

4025 

4246 

221 

7 

4466 

4687 i 

4907 

5127 

5347 i 

5567 ! 

5787 

6007 

6226 

6446 

220 

8 

6665 

6884 ^ 

7104 

7323 

7542 ^ 

7761 ’ 

7979 

8198 

8416 

8635 

219 

9 

8853 

9071 1 

8283 

9507 

9725 : 

9943 ! 

300161 

300378 

300595 

200813 

218 

200 

301030 

301247 

301464 

301681 

301898 

302114 

302331 

302547 

302764 

302980 

217 

! 

3196 

3412 : 

3628 

3844 

4059 

4275 

449! 

4706 

4921 ! 

5136 

216 

2 

535! 

5566 

578! i 

5996 

6211 

6425 

6839 

6854 

7068 

7282 

215 

3 

7496 

7710 j 

7924 ! 

8137 

8351 

8564 

8778 

8991 

9204 

9417 

213 

4 ; 

9630 : 

9843 ; 

31 0056 

310268 i 

310481 

310693 

310906 

311118 

311330 

311542 

212 

205 

311754 

311966 1 

2177 

2389 

2600 

2812 

3023 

3234 

3445 

3656 

2!l 

6 i 

3867 i 

4078 1 

4289 i 

4499 ; 

4710 

4920 

5130 

5340 

5551 

5760 

210 

7 j 

5970 i 

6180 j 

6390 i 

6599 

6809 

7018 

7227 

7436 

7646 

1 7854 

209 

8 ! 

8063 

8272 

8481 i 

8689 ! 

8898 

9106 

! 9314 

9S22 

; 9730 

f 9938 

208 

9 

320146 

320354 

320562 ! 

320769 ! 

320977 

321184 

: 321391 

321598 

321805 

; 322012 

207 

210 

322219 

322426 

322633 1 

322839 

323046 

323252 

3234S8 

323665 

323871 

: 324077 

206 

I 

4282 

4488 

4694 ! 

4899 

5105 

5310 

5516 

5721 

5926 

€131 

205 

2 

6336 

6541 

6745 1 

6950 

7155 

7359 

7563 

7767 

7972 

8176 

204 

3 

8380 

8583 

8787 i 

8991 

9194 

9398 

9601 

9805 

330008 

330211 

203 

4 

330414 

330617 

330819 1 

331022 

331225 

1 331427 

331630 

: 331832 

2034 

2236 

202 

21 S 

2438 i 

2640 

2842 I 

3044 

3246 

1 3447 

3649 

i 38S0 

4051 

4253 

202 

6 

4454 1 

4655 

4856 ' 

5057 

5257 

i 5458 

5658 

5859 

60S9 

6260 

201 

1 

6460 ! 

6660' 

6860 

7060 

7260 

7459 

7659 

: 7858 

8058 

8257 

200 

8 1 

8456 i 

8656 

8855 

9054 

9253 

9451 

9650 

9849 

340047 

340246 

199 

9i 

340444 i 

340642 

34084! i 

341039 

341237 

341435 

341632 

341830 

2028 

222S 

198 

j 

H. 1 


B 



4 

5 

6 

; J 

8 

B 

m 
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H. 

0 

1 

2 

3 

4 

6 

6 

X 

8 

9 , 


220 

342423 

342620 

342817 

343014 

343212 

343409 

343606 

343802 

343399 

344196 

197 

1 

4392 

4589 

4785 

4981 

5178 

5374 

5570 

5766 

3962 

615*^ 

196 

2 

6353 

6549 

6744 

6939 

7135 

7330 

7525 

im 

7915 

8110 

195 

3 

8305 

8500 

8694 

8889 

9083 

9273 

9472 

9666 

9860 

350054 

194 

4 

350248 

350442 

350636 

350829 

351023 

351216 

351410 

351603 

351796 

1989 

193 

225 

2183 

2375 

2568 

2761 

2954 

3147 

3399 

3532 

3724 

3916 

193 

5 

4108 

4301 

4493 

4685 

4876 

5068 

5260 

5452 

5643 

5834 

192 

7 

6026 

6217 

6408 

6599 

6790 

6981 

7172 

7363 

7554 

7744 

191 

8 

7935 

8125 

8316 

8506 

8696 

8886 

9076 

9266 

9456 

9646 

190 

3 

9835 

360025 

360215 

3S0404 

360593 

360733 

360972 

361161 

361350 

36153-^ 

188 

230 

361728 

3S1917 

362103 

362294 

362482 

382671 

362859 

363048 

363236 

363424 

183 

1 

3612 

3800 

3983 

4176 

4363 

4551 

4739 

4926 

5113 

5301 

188 

2 

5488 

S67S 

5862 

6049 

6236 

6423 

6610 

6796 

6083 

7169 

187 

3 

7356 

7542 

7729 

7915 

8101 

8287 

8473 

8659 

8845 

9030 

186 

4 

9216 

9401 

9587 

9772 

9958 

370143 

370328 

370513 

370698 

370883 

185 

235 

371068 

371253 

371437 

371622 

371806 

1991 

2175 

2360 

2544 

2728 

184 

6 

2912 

3096 

3280 

3464 

3647 

3831 

4015 

4198 

4382 

4565 

184 

7 

4748 

4932 

5115 

5298 

5481 

5664 

5346 

6029 

6212 

6394 

183 

8 

6577 

6759 

6942 

7124 

7306 

7438 

7670 

7852 

8034 

8216 

182 

9 

8398 

8580 

8761 

8943 

9124 

9308 

9487 

9668 

9849 

380030 

181 

240 

380211 

380392 

380573 

380754 

330934 

381115 

381296 

381476 

381 S56 
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3082 

3133 

3183 

3234 

3285 

3335 

3386 

3437 

1 51 

8 

3487 

3538 

3589 

3639 

3690 

3740 

3791 

; 3841 

3892 

3943 

! 51 

9 

3993 

4044 

4094 

4145 

i 4195 

4246 

! 4296 

4347 

4397 

4448 

I 51 

860 

934498 

934549 

934599 

934650 

i 934700 

934751 

1 934801 

934852 

934902 

934953 

50 

t 

5003 

5054 

5104 

5154 

5205 

5255 

; 5306 

5356 

5406 

5457 

50 

2 

5507 

5558 

5608 

5658 

5709 

1 5759 

5809 

5860 

5910 

5960 

50 

3 

€011 

6061 

6111 

6162 

6212 

6262 

1 6313 

6363 

6413 

6463 

50 

4 

6514 

6564 

6614 

6665 

6715 

6765 

6815 

6865 

6916 

6966 

50 

865 

7016 

7066 

7117 

7167 

7217 

7267 

7317 

7367 ! 

7418 

7468 

50 

6 

7518 

7568 

7618 

i 7668 

7718 

7769 

7819 

7869 

7919 

7969 

SO 

7 

8019 

8069 

8119 

8169 

8219 

8269 

8320 

8370 

8420 

8470 

50 

8 

8520 

8570 

8620 

8670 

8720 

8770 

8820 

8870 

8920 

8970 

50 

9 

9020 

9070 

9120 

9170 

9220 

9270 

9320 

9369 

9419 

9469 

50 

870 

939519 

939569 

939619 

939669 

939719 

939769 

949819 

939869 

939918 

939968 

50 

1 

940018 

940068 

940118 

940163 

940218 

940267 1 

S40317 

940367 

940417 

940467 

50 

2 

0516 

0566 

0618 

0666 

0716 ! 

0765 

0815 

0865 

0915 

0964 

50 

3 

1014 

1064 

! 1114 

1163 

1213 

1263 

1313 

1362 

1412 

1462 

50 

4 

, isn 

156! 

1611 

1660 

1710 

1760 

1809 

1859 

1909 1 

1958 

50 

875 

2008 

2058 

2107 

2157 

2207 

2256 ' 

2306 

2355 

2405 

245S 

SO 

6 

2501 

2554 

2603 

2653 

2702 

2752 

2801 

2851 

2901 

2950 

50 

7 

3000 

3049 

3099 

3148 

3198 

3247 

3297 

3346 

3396 

3445 

49 

8 

3495 

3544 

3593 

3643 

3692 

3742 

3791 

3841 

3890 

3939 

49 

9 

3989 

4038 

mm 

4137 

4186 

4236 

4285 

4335 

4384 

4433 

49 

M. 

0 

1 

2 

3 

4 

S 

6 

T 

S 

S 

D. 
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APPENDIX S 


Demonstrations 


To, prove that Sx 


Let Xi 

Then 2a: 

But I 

Therefore, 2a: 


Section 9.1 

0 . 

Xi — X, X2 — Xz ~~ ■ * * j ~ Xif 

2(Z - X) 

2Z - Nl. 

2 X 

N ' 

^X-N?^.0. 

N 


Section 9.2 

To prove that Z = Zd + —* 

N 

^ Xi + ^*2 + * * * + Zjv 

i-- F 

Adding and subtracting Xj, 

^ ^ ^ (Xx ^ Xd) + (X2 -- Xd) + • • • > (X^ Xd) 

A == Ad 4- * • 

But, by definition, 

di ss Xi — Xd, ^2 X2 ~ Xd, ' * * > ^ •" ^4, 

Then 

^ , dx + ^2 + • • • + div* 

A — Ad T — « — 

N 

= Xj + — 

If each item is weighted by its frequency, the expression is 
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Section ,9.3 

To prove that X > (? for a series of positive values not all the same. 
Xi and Xat are the smallest and largest values of the series. For these 
two values, 

{Xt ~ > 0; 

X? 2X1X;, + X^ > 0. 

Adding 4X1X7/ to both sides of the inequality gives 
XI + 2XiX^ + X^ > 4XiXi,/ 

Taking the square root, we have 

Zi + Zjv > 2 VXiXy and 


If Zi and Xif are each replaced by 


Xi + Xk 

> 

2 


the value of X for the 


entire series is not changed. However, such a replacement increases 
the value of G, since — — - > V XiXjv and the contribution of 


to the geometric mean exceeds the original contribution of 

XiXiv. Continually repeating tWs process for the smallest and largest 
remaining values results in continually increasing the value of G, which 
approaches X, and equals it following the last substitution, since the 
individual values are then all the same. 



Section 9.4 

To prove that <7 > for a series of positive values not all the same. 
Xi and Xn are the smallest and largest values of the series. In the 
preceding section, it was shown that 

Zi + Ziv > 2 VZiZiv. 

VZiZjv (Zi + Xh) > 2ZiZy and 
a/tF1 F 

1-^ AT ^ I 

Ai -f- Ji-n 

2 ^ 2 
Xx-VXn ~ ± I 
XiX^ Xy 


Therefore, 


But 


2XiZ^ 
Zi + Z^ 


) which is H. 
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2XiXAr 

If Xi-and Xjv are each replaced by their harmonic mean, — — — the 

* Xi -\- Xtf 

value of H for the entire series is unchanged. However, such a replace- 

2iXiXjf / 

ment decreases the value of G, since < v XiXn and the con- 

Xi + An 


tribution of 

\Xi + XmJ 


to the geometric mean would be less than the 


contritFation of XiXn. Continually repeating this process for the small- 
est anci largest remaining values results in continually decreasing the 
value of Gf which approaches J/, and equals it following the last substitu- 
tion, since the individual values are then all the same. 


Section 10.1 

To prove that is smallest when Xd - X; that is, that is a 
minimum. Where = d==X — and Xd may be any 

designated value, which may or may not be X. Then 

^ Xd)^ 

- SX2 ^ 2XdSX + NXl 

But 1 “ and SX == NX, so 

■ Sd* = SX* - 2XaNX -I- NXl 
Adding and subtracting NX^ gives 

Sd* = SX* - XX* -1- (XX* - 2XdXX -f NXD, 

= SX* - Xl* + X(l* - 2XrfX -i- XI), 

= SX* - Xl* -h X(1 - Irf)*. 

If Xd is either larger or smaller than X, the third- term, X(1 -• 1^)*, 
is positive, and therefore Sd* is smallest when Xd = X, in which case 
Sd* = Sx*. 

Section 10.2 



Since 
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But since 


SX 

N 

N 


= 1, 


ISx^- 


sx= 

Vir 






SX^ 


- X* 


Therefore; 




\ A / 


1 - la 

[sx^ 1 

X ' 

, or X = 

^SXN 

^ N J 

d “h 

2 

1- 

jD(d + 

J iV 

2(d H“ Xd) ^ 

X J 


^ j2(d^_ + 2dXe + Xj) 




A 


^/S(f» + + Nil iXdy±2Nlj^d+N^Il 


N 


m 


^JL,ov?d y,_ (2dy 
iV jv" ;v2 


ISd^ /2dY 

“ V N \X/ 


« ^ 5^2 

ZArf ^ — A<i( 


For a frequency distribution. 


\ 


and 




S/a:» ISfd^ 

N N 


( 


isr / ■ 


Or, with deviations in terms of class intervals, 

. l^fid'Y /S/d'V 

Vir = W-]r--V’N";- 

Section 10.3 

2fidy 


To prove that Ts 


N 


sMimi+2 

N N 




It was shown in Section 9.2 that 
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For any selected X "value, say X.i, xi — Xi — X = Xi 

n' 

Sd 




But Xi — = di; therefore, Zi = di — 


Similarly, 

Thus, 


Zt — di 


Sd ■ 

'n 


■> Xs 


N 


etc. 


N 

sfd‘>-3~d^ + 3 


N 


iltzM 


N 

Sd« - 3 Sd^ + 3 
N 




N 


Sd* „ Sd Sd2 , „ 

-3 — — +3 

N N N 

Sd« ^ Sd Sd^ , „ 

“F 

Sd* Sd Sd* 

T-'TT 


(f) 
(¥) 
{ 


* ^ 
N 


\N 


©' 


n)' 


For a frequency distribution this becomes 


N N N N \N 


(M 




or, in terms of class intervals cubed, 


^ _ md'Y „ z/d' s/(d')* , 

' N N N ^ 


(¥)' 


n' 


Section 12.1 

The Least-Squares Criterion 

The following discussion assumes that the distribution of chance errors 
follows the normal curve, and that the best central value from which to 
measure such accidental deviations is therefore that value which makes it 
most probable that the deviations are distributed normally. 
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Let a series of such deviations, or errors, and'the interval within which 
they fall be designated by the following symbols: 

Xi is an item falling at the mid-point of a very small interval, Axi; 

X2 ‘i “ “ “ “ “ “ “ “ “ “ “ Asj; 


li ii i< a t( it it a a a 

Jbx 

Now the probability that a deviation will fall within a certain interval is 

p — Area of frequency curve within boundaries of that interval 
Area of entire frequency curve 

Thus the probability of obtaining an error xi which falls within the inter- 
val is approximately the ratio of the area of a rectangle, with base of 
Axi and height the ordinate at the mid-point of the interval, to the area 
of the entire frequency curve. 

If this curve is the normal curve, this probability is 

_ XI 

e ^^^Axiy 




since the expression for the ordjnate of a normal curve as a ratio to the 

entire number of frequencies is Yc = 7= e 

cr V 2 t 

The probability, of obtaining errors X2, Xz^ etc., falling within specified 
intervals is similarly obtained. 

The probability that several independent events will occur is the 
product of the individual probabilities of the separate events. Therefore, 
the probability that the particular set of errors will occur which we have 
assumed (that is, a normal distribution of errors) is as follows: 



— ^ X Axi X Axa X • • • X Ax,v. 



798 


APPENDIX S 


Since any number raised to a negative power will be greatest when that 
exponent*^ is least*, P‘ is greatest when a:? + x* + • • • 4 2 :^ is least. 
Therefore, the probability that accidental deviations from some central 
value will follow the normal curve is greatest when the sum of the squared 
deviations from that central value is at a minimum. 


Section 12.2 

Dej^vation of the Normal Equations for a Straight Line 
Fitted.by the Method of Least Squares 

If Fc is a trend, or computed, value, 7 — Fc is a deviation from trend. 
To satisfy the least-squares criterion, S(F — Fe)* must be a minimum. 
Since the straight-line equation type is Fc = a 4 bX, 

S(F - F,)^ = S[F - (a 4 bX)Y = 2(F - a - bXy. 


Expanding, this expression becomes 

SF^ - 2aSF - 26SXF 4 Na^ + 2ab'2X 4 b^'SX^ (1) 


If this expression is solved for a and b, wc shall obtain the two normal 
equations. Rewriting expression (1) according to descending powers of 
a gives 

Na^ + 2a(b^X - SF) -f- SF* - 26SXF 4 


This is a quadratic of the type pm® + qm + r, where p ia N, m is a, q is 
2(6SX - SF), and r is SF^ - 26SXF 4 b'SX*. If p is positive (as 
it must always be for statistical problems when p = N), such a quadratic 

has a minimum value when m = — Therefore, 

2p 

-2(b2X - 2F) 2F - b'STx 

N ® 

Rewriting (2) gives 

2F = iVffl 4 62X. . • . , the first normal equation. 
Rearranging expression (1) according to descending powers of 6 gives 
6*2X2 -4 26(o2X - 2XF) -1- 27* - 2o2F 4 Xa* (3) 

In this quadratic, p is 2X*, w is 6, g is 2(o2X - 2XF), and r is 2 F* — 
2o2 F 4 Xa*. Since 2X* is positive, expression (3) will have a minimum 

value when m =>= — so 

2p 


-2(a2X - 2XF) _ 2XF - o2X 
22X* 2X* 


( 4 ) 
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Rewriting (4) gives 

SXF = oSZ + 6SX*. . . . , the second normaf equation. 
Section 13.1 

Derivation of the Equations for Fitting Growth Curve of the 
Type Yc — fe + ab^ 

Designating by n the number of years in each third of the dSta, the 
first equation (see Equation I, p. 301) is: 

SiY — nk + a -jr ab + ab^ + a¥ 4- • • ■ -f ab’-'‘~^ 

= nk + a[l + b + ¥ + ¥ + ■ • • + 

If now the expression inside the brackets be multiplied by ^ we have 

[l+b + ¥ + ¥+ • - • + &<"-«](& - 1) 

b - 1 

b + ¥ + ¥+■•■ + + ¥-l-b-¥-¥ 

b-l 

¥ - 1 
~ b - 1 ' 

The fourth term shown in the numerator of expression (2) is This 

follows from the fact that the next-to-the-last term within the brackets 

of expression (1) may also be designated as and hC"-*) X. b — 

All three equations are obtained in a similar fashion. They are: 

% 

I. SiF ==nk + a 

/h’* - l\ 

ir. S 2 F = nfe + a¥ U-3Yj- 

III. SaF ^nk + a¥” 

Equations A, B, and C now are: 

A. SaF - S,F = a (~~) (&” - 1) » o 

B. SsF - SaF = ab” 

d ^ 1 


•d) 




■•( 2 .) 
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^ 2,7 - 22^ , (&” - 1)“ (fe” - 1)' ^ 

C. = ab- V -f- a = b\ 


2^7 - 2iF 


& - 1 


* ”/23F - 2jF 
Therefore, b = V s^y - SiF 

• • 

Equation A gives us the formula for a: 

ib’^ - 1)2 


22 F - 2iF = a 


b - 1 


a = (2sF - 2iF) 


1 


(bn _ 1)2 


From Equation I we find: 

2iF = nk + 


( ct ) 




Section 19.1 

To prove that Fc = F. 

F. = a + bX. 

2F. = 2(a + bX) 

= Naf b2X. 

But iVo + ?>2Z = 2F (Normal equation I). 

Therefore, 2Fc = 2F 

2F, 2F , 

== — ; and 

iV N 

Fc = F 

To prove that 2F,' = a2F + b2ZF. 

2F„“ = 2(0 + bXy . 

= 2(0^* + 2abX + 6®X*) 

= iVo^ + 2a62Z +b=“2X2 
= a(Xa + bZX) + b(a2X + 62X*). 

But Na + bSX == 2F (Normal equation I), and 
o2Z + b2X^ = 2ZF (Normal equation II). 

Therefore, 


( 1 ) 

( 2 ) 


2F“ = a2F + bSXr 


(3) 
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To prove that — FSF. 

By the procedure shown in footnote 3 "of Chapter 21 for it hiay be 
shown that 

27* - 727. 

Similarly, it is true that 'Lyl = 27^ — 7^27^: 

But 7c = 7 (Equation 2) and 27c = S7 (Equation 1). 

Therefore, "Zyl = 27^ — 727 .(4) 

To prove that hy] = 2 7* — 27* 

Zy] = 2(7 - 7c)* 

= 27* - 2277c + S7*. 

But 7c = o + 6Z; hence, 277c = 2[7(a + bX)] = 2(a7 + bXY) 


= a27 + b2Z7. 

Now o27 + 62X7 = 27* (Equation 3). 

Therefore, 2y* = 27* - 227* + 27* 

= 27* - 27* (5) 

To prove that 2;/* = blSixy. 

22 /* = 2(6x)* = 6*2a:* = 6 ^ 2a:* = 62x2/ (6) 


To prove that 22/* = 2//* — 2y*. 

22/* = 27* - 27*. (Equation 5) 

But ^ 

27*,= 22/* + F27, and 

27* = 22/* + 727. (Equation 4) 

Therefore, 

22/* = (22/* + 72 7) - (22/* + 72 7) 

= 22/*- 22/* (7) 

Section 19.2 

Derivation of Constants for Straight-Line Equation 
when Origin Is at X, 7 

The normal equations for fitting a straight line by the method of least 
squares are 

27 = Na + 62X; 

2X7 = a2X + 62X*. 
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If the origin be taker? at X,V instead of 0,0, we have 


Xy — Na + b'Ex-, 
"Lxy = aSx + 


5ut.S^ = 0, and hx = 0. 


Therefore, a = 0, and h 


'2ixy 


The estimating equation becomes y^ — hx instead of Fc = a + 5X. 


To prove that 


M 

^y^ 


Section 19.3 

Sx^S?/* 


Since y^ == hx, we may write 


S ?/2 ' 


From the second normal equation, h 


^xy 


Therefore, 


/SxyV „ j 

Sj/2 

Section 19.4 

„ , Sxj/ ATSXF - (SZ)(SF) 

To prove that — = : — - = ■ ^ -- ' ' =rr=-:'=:::-=::.-rr- • 

NsxSr 'V[iV'SX« - (2X)5“][iVSF2 - (SF)®] 

2x2/ = 2[(Z ~ Z)(F - F)] = 2(XF - If - ZF -f If), 
= 2XF - I2Y - F2X + Nif 
= 2XF - NiY - NiY + Nl? 

= 2XF - NiY. 

[SX^ /SXy ^ /2F^ /2FV 

-V IT - W W - W ■ 
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Therefore, 


Sxy SXF - Nlf 


NsxSr 1 

Nyj- 

SX^* / 
N \ 

SXV 

X(SXI 

(sr- _ 1 

' -NiY 

(?) 

) 

2 


Ixx^ 

J X 

(f)' 

..WT- 

/XYV 

Vx / 


N"SXY - (SZ)(SF) 
'SjlNXX’^ - (SX)2][ASF2 - (SF)“] 


Section 19.5 

Given that Xi, X 2 , • • • , Xn can take values only of the integers 1 
through N, without duplication or omission, and that the same is true of 
Y^, F2, - • , Yx. 

6SD2 

To prove that w = 1 - 

Paralleling the proof given in Section 24.4 for arithmetic means, it may 
be shown that 

= 4 + Sr — 2rsxSr, 

where D = X — Y. From this relationship it follows that 


si + 4 


N 


2sxSr 

But = SF* wh'en we are dealing with ranks. Therefore, 

■XD^ 


2sl 


N 


r* rank 


24 


1 .- 


2Nsl 


* N(N + 1) 

Now XX is the sum of the first N natural numbers, or -> 

^ N + 1 
X — ! 


and SX® is the sum of the squares of the first N natural numbers, or 

iV(iV + IX2N + 1) . ^ 

6 
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Nsi = S(x - ly = sz® - xsz, 

^ N{N + 1)(2A + 1) _ N + 1 _ NjN + 1) 
~ 6 22' 

_ 2A(j\r 4- l)i2N + 1) - 3N{N + 1)° 

■ * 12 
Nim - 1 ) 

12 

Substituting in the expression for r, we have 

r,.»k - 1 _ 1) ~ N(N^ - 1)' 

6 


Section 20.1 

The point of diminishing absolute returns is the highest point in the 
total returns curve. At this point the slope is zero. The slope of a 
curve at any point may be found by taking the first derivative of the 
equation. The first derivative of the equation F* = o + bX + cX^ + 
dXHs 

dY 

= 6 + 2cZ + ZdX^. 


Setting -i-p 
dX 


= 0, we have X 


-c ± Vc® - Zbd 
3d 


For the total returns equation Yc = 890.32 + 78.264Z + 20.324X® — 
4.4649Z®, the above equation yields X = —1.337 and 4.371. When the 
slope is zero, we have a maximum or a minimum point. Only positive 
values of X are of interest, and inspection of Chart 20.3 indicates that a 
maximum is reached when X is close to 4". Or, if the reader will compute 
Yc values in the neighborhood of X = —1.337 and X = 4.371, he will 
discover that the former is a minimum and the latter a maximum. When 
X = 4.371, the computed total returns Yc = 1,247.85. The point of 
diminishing total returns is reached when the input of nitrogen is 4.371 
per cent. At this point the estimated yield is 1,247.85 pounds. 

The point of diminishing marginal returns is the point of inflection in 
the curve. It is the point where the change in the slope is zero. The 
change in the slope is the second derivative of the estimating equation. 
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Thus, 


dX^ 


2c + MX. 


Setting 


dWc 

dX^ 


= 0, we have X 



For the total returns equation, the point of inflection is X = 1.517. 
Thus the point of diminishing marginal returns is reached whenlfhe input 
of nitrogen is 1.517 per cent. At this point the e^imated yield is Yc = 
1,040.23 pounds. 

Section 21.1 

Proof that 

/ ri2 - rnr2z V ^ S^cx.23 

\\/ 1 — ri3 V^l — rlJ 2:^1 — 2x^1. s 

A demonstration for the other formulas of these types would proceed 
along similar lines. 


ri2 - ri3r23 

Vi - r?, vr^ 


2ri2ri3r23 + 
rlz - rl + r?3r|3 


other r^s. Therefore: 


(hXiXiY 


V'EzlZxl 


and similar formulas obtain for the 


{ZxiX2y r '^SxiX2 Sx iXa 11X2X2 (hxiXz 

(Xxixzy {Xx^xzy r (Hxixzy {Hx^x^y 


* "V 2 ^ V ’C' j,2 


Sa:?Si? 2i2S^S ' LSz?S:r^ SxlSxf J 

Multiplying numerator and denominator by SzjSxaCSxs)^, this simpli- 
fies to the following equation : . 

, - 2 Sx|Sx,l 323 !.l 3 S 23 X 3 + (Sx. 23 )“(Sx 3 X,)i! 

" SsfSx^CS*!)* - S4S4(2xiX,)^ - SxfSxKSx^xa)^ + (Sx,xa)“(2-t2a=3)^ ^ 


We know that rf2.3 


But S*?, 


bjsSziXs 


ZxLas - Sxi3.3 
- SxA.3 


SxiXs ^ (SariXs)® 
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Also, 2xfi.23 = hn.z'2i*XiX2 + hu.z'^XiXz. 

Now, the normal equations for obtaining & 12.3 and 613.2 are: 


11 . ^XiXz = bn.$^xl + biz.2'^X2Xz] 

III. ^XiXz = bi2,Z^X2Xz + 613 . 2 SX 3 . 

In order to solve for 613 . 2 , we may multiply Equation II by Xz 2 Xz^ and 
Equation III by and subtract Equation II from Equation III. 
Thus, 

IL ^XiX2l^X2Xz = bu.z^xlIix^Xz + 613 . 2 ( 2 :^ 2 X 3 )® 

III, SxiXsSx^ = 612 . 32 X 2 SX 2 X 3 "4" 613 . 22 X 22 X 3 

SXiX 32 x 2 — 2 xiX 22 x 2 X 3 = 613.22X22X3 — 613.2(2X2X3)® 

2xiX32x2 — 2 X 1 X 22 X 2 X 3 
"" 2 x 12 x 1 --- ( 2 x 2 X 3 )® 

In a similar fashion, we may solve for 612 . 3 . This involves multiplying 
Equation II by 2x1 Equation III by 2 x 2 X 3 . By such a process we 
find that 

2 x 1 X 32 x 2 X 3 — 2 xiX 22 x 1 
““ ( 2 x 2 X 3 )® - 2x12x1 

Substituting these expressions for 613.2 and 612,3 in the equation for 2x®i.asj 
we have 


This simplifies to 

2 (2xiX3)^2xl + (2xiX2)®2xl — 22 x 1 X 22 x 1 X 32 x 2 X 3 

2xl2xf^“(2x2x^ 

Now substituting our expressions for 2x®i.23 2 x®i .3 in Formula (3), we 

have 


^?2.3 


( 2 XiX 3 )® 2 x 1 + ( 2 xiX 2 )^ 2 x 1 — 22X1X22X1X32X2X3 
~ 2 x 12 x 1 ' - 7 :^^)® ' 


2x1 


(SxiXs)® 

2 x 1 


(SxiXs) ® 

2x1 


Expanding and simplifying, this expression becomes Equation ( 2 ). 
Therefore, 



fl 2 — fl8f28 



"Xxl - ‘ 
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Section, 24.1 

To prove that — - — — ... T^ _ +_^ _ -^rffen Ni — N 2 = 


K 


Nk = N. 


Xi + X 2 + ■ ■ ■ + Xk Ni Na 


SXi SAa , , SXk 

+ -r;^ !-••■ + 


Xk 


K 


K. 

XX I -f" XX 2 "f* 


4" XX s 


NK 


N 


Each random sample of N items contains — of the population, and each 
N 

item will occur — K times. Therefore, 

(P ’ 

N 

^ Tif V y 

SZi + SXs + • • • + SZk <P 1 


NK 


NK 


where X indicates a summation over the items in the population. 
1 

N 

= Zfl.. 

Section 24.2 


To prove that cr^t 


cr 

Vn’ 


when Ni — N 2 — 


‘ ^ N. 


The scheme of the random samples appears as follows: 


Item 

Sample 1 

Sample 2 

Sample 3 

a 


X,2 

Xai 

h 

Xbx 

X,2 

Xez 

c 

Xcl 

Xc2 

Xce 

N 

Xm 

Xi>r2 

Xm 


There are K samples. The individual items are replaced after each 
sample has been drawn. 
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We shall use 

A' 

2 to indicate a summation over the K samples ; 

1 

CP 

S to indicate a summation over the items in the population; 

1 

2 to iiidieate a summation over a sample over a particular sample if a 
subscript follows X; thus, 2 Xi is the sum of the X values in sample 1 ; 
and^ 

X to mean X — X^y, a usage of x employed only in this proof. 

The deviations of the items from the population mean are Xai = Xai — 
Xi^,i>^Xbl ~ Xbl — A(P, * * * , XnI “ XmI — X(P, Xa2 ~ Xa 2 CtC. 

We can therefore write the various items as + Xai, • • • , 

A(p -f" ‘rA'Ij X(p -f* Xa2} etc. 

For Sample 1 : 2 Xi ~ XX(p + 2 a;i, 

For Sample 2 : 2.Y2 = iVXp + 2 ^ 2 , 
and so forth, 

where 2 .ri 5^ 0, 2.r2 0 , etc., since x X — X(p. 

' Adding a constant to (or subtracting a constant from) a series of values 
does not alter the value of the standard deviation of those values, so that 

For the K samples, 

S(S.rj= * r S(Sx) 

, _ J J 

0 -IA- ~ 

2(2.rj2 
1 

since 

K 

2 ( 2 j?) = 2^1 4 “ 2 j :^2 4 ~ * ' * 4 " 2 .tjv = 0 ,. 

1 

and 

AVlx = 2(2^;)^ 2(.ra + Xb + Xc + • • ' + 

I 1 

For any one sample, 

(a:® 4- ^6 4- ^44- ' ' * 4- Xe^y - xl + XaXh 4 - XaXc + » • • 4 - 

+ XaXb + xl + XtXc 4 - • • ‘ 4 » xa^ 
4 “ XaXc + XbX^ 4 “ ‘ 4 - XcXh 

4 " ' * * 

4 ~ XaXff 4 " XbTff 4 " XcX^ ^ ^ ^ 

as Xxi 4“ 22a%Xf, 
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where Xi represents any item and XtXj represents the product resulting 
from each combination of two different items. Therefore, for the K 
samples, 

K 

= 2:(2x^ + 2Xx,Xj), 

1 

K " 


Each sample of N items contains ~ of 'the popi?lation, and each item 
N iV . 

will occur in of the samples, or ~ K times. If a given item (xi) occurs 

N . . . AT - 1 

in ~ of the samples, a second item (xj) will occur in of the samples 

N M - I 

in which the first item occurs, and both items will occur in — * — — 7 

(P (P - I 

N(N - 1) 

of the samples, or —K times. Thus, each x.,X;, will occur 


(P({P - 1) 


N{N - 1) 

^ - A times. 

(P((P ~ 1) 


Therefore, 


N ^ 

CP 1 


Kal^ = 11 KXx!^ + 2 If K'Zxaj 


^--SkS: 

(P((P - 1) 1 


and 


N t 
<P 1 


= -— Srf + 2 — I: z^x,Xj. 

' ' yr\/yn * ■' 


NiN-l) Z 

* Jjfi 

<?((P ^ I) 1 


By a development similar to that shown above for (Sa*) “ for one sample, 
we have 


<p / ^ ' \ ^ 

2Sx,.Ty == {^xA- — Sa:|. 


{p (p® (p 

But 'Zxi 0. Therefore, 2'SxiX^ — — 2xf, and 

i 1 1 




s Af 

(Tvv- = — 2jX, 

(Pi <?{(? - 1) 1 


N , N(N - 1) , 

- (Pff2 ^ (Pcr2 

(P <P{(P - 1) ’ 


= Ncr^- - 


N(N - 1 ) 
<P - I 
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c= Na^ 




tss 


((P - 1) - (N - 1) 
(p - 1 






(? - iV 
CP - 1 


«rsx 



(P ~ N 
(P - 1* 


Since each sample consists of iV* items, each deviation of a sample sum 
from the arithmetic mean of the sample sums is N times as large as each 
corresponding deviation of a sample mean from the arithmetic of 

the sample means, Z(p, and each squared deviation of a sample sum is 
times the squared deviation of each sample mean. Therefore, the stand- 
ard deviation of the sample sums is N times the standard deviation of the 
sample means. Dividing each side of the last equation by N gives 

cr le- N 


' If (P is infinite, or, if CP is finite but large in relation to N, so that the 


value of 



is effectively 1, the expression may be written 




cr 


Vn 


Section 24.3 


To show that 
Nk = N. 


n + + 




K 


cr^, when Ni ^ 


N 


The variation of a single sample from X(p is S(X — This may 

1 

be divided into two parts 

i(X - = ll(Z + 

1 1 

where X represents the mean of a sample, 

- S((X - ly 4- 2(Z -1KI~ la.) + (1 - I^)% 

1 

= S(X - 1)® + 2(1 - la.)l(X - D + JV(1 - la.)^ 
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N _ 

But S(X — Z) = 0, and, therefore, 

1 

N ^ _ 

s(z - Xa-y = S(z - xy + n{x - X(?y. 
1 1 

Summing for the K samples, 


S 

i 


S(Z - Z<p)“ 


K 

s 

1 


S(Z - Z )2 

1 


+ S[Ar(X - x^n 


N 


Each random sample of N items contains of the population, and each 
N 

item will occur — ■ K times. Considering each of the three parts of the 

(p 

preceding expression separately, we have 


K 

S 


1 L 1 


Six- I^y 


N 

- zs(z - Xs>y, 
l(z - I<py 

NK — 


K r N 


S 

1 


S(z - xy 

1 


= NK<t\ 
] = 


(? 


K 

1 


where s® is the variance, 


IT 


of a sample. 


S[NiX - Za.)^] = zl(Z - Z<p)^ 
1 1 

= NK(tI. 

We may now write 

K 

NKa^ = NSs^y- NKffl, 

1 

and, dividing by K, 

N<x^ = N$^ + Z(r|, 

where is the arithmetic mean of the s® values. 


iV<r® = Zs® + Z — . 

Z 

= Zs® + ar\ 
]Sfa^ _ 0 -® = Zs®. 
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(X^iN - 1) = Ns\ 

~ A - ’ 

a. a. . . a. M 

N N N ^ ^ N 

" W^l K 

1 iz\ , ' 2 ix\ S.t| 

N - 1 X- 1 + ■ ■ ‘ + A -'l 
• " K 

“ X 


Section 24.4 

To prove that — ■v^ + crj, for independent samples. 

Given two independent series of paired arithmetic means, the means 
being for random samples of the same size, and each series consisting of 
K means, as follows: 


Sample 

Series 1 

Series 2 

Difference 

1 

Xui 

X^,i 

Xui - f2.1 

2 

|l.s 

I2.2 

Xui ~~ -^ 2.2 

3 

A'l.a 

X 2.3 

“■ -X^2.8 


X Xi,K. X^tK 

SuK - ^2.K 

The variance of the differences is 


K 


S[(Xi - X2) - 
^2 1 

(iCt - 



where {Xi — X2) is the arithmetic mean of the differences and may be 
written 

K K K 

2 (Xi ^ X2) Sli SZ2 

- Zi -- I2, 

where and J2 are the arithmetic means of series 1 and scries 2 , 

|{(Ji - X,) - (It - 

so that 4 ._jE, = ^ ^ > 

Jx 

l[(Ji - It) - (I, ~ I,)]2 


K 



DEMONSTRATIONS 


813 


Writing xi = Xi~ h and f 2 = - f 2 , wehave 

i* 

K K 

S(xi — X2y ^{x\ — 2 ^ 1 X 2 4- Xg) 

^2 _ J _ J 

K K K ^ 

SXi SxiX2 SX2 

2 J L J_ 

A" K ^ K 

K 

SxiX 2 

J--_ is portion of the expression for the correlation coefficient 
K. 

K 

2 XiX 2 

for the two series of means, which may be written = 77 (see 

AcTj^jCTxjj 

page 465 for the product-moment formula for r for a sample), so that 

K K K 

SX 1 X 2 Sxf 2 x^ 

2 = 2rx,3i,crs,crs:,. Also, = 0 - 1 , and ~ = cr|,. 


Therefore, 


crl.-j, = cr|, - 2rs, staffs, + <r|„ and 


O'.?,-.?! 


O’.?! — 2r?,x,<rje,<r^, + ffi,. 


Since the two series of means are independent, rj,.?, = 0 and 


Section 24.5 

^2 g,2 

is an equally weighted average of $‘1 and d‘1. Using weights 

A 

equal to the number of degrees of freedom (ATi — 1 and iV 2 1 ) in each 
of the two samples, we have 

^ (A^X - + (i ^2 - 1)^1 

iVi-l+iV 2 -l 
So:? 

_ ~ Wi - 1 ~ iVa - 1 

Ni - 1 + iV2 - 1 ’ 

^ Sxf + Sxl 

JVi — 1 4" ATa i 
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Section 24.6 


To prove that &i+i 




i + ^ when N, = N2 = N, 
Ni Na 


ll 1 


(N - D&i + (N - 1)^2 (iv - m + (jy - 1)^ 1 

N - 1 + N - 1 N - 1 + N - 1 


N 


N 


(N- l)(gf + ^|) (N - l)(^^+^l) 
’ 2N -2 . 2N -2 


N 


+ 


N 


l&l + H ^l +&l 

2 


N 


+ 


l &l + &l ^ 
N 

lETH 

INi'^ Na 


N 

I&! &l 


Section 25.1 

To prove that ctp = 

A proportion p is the arithmetic mean of a series of values where each 
occurrence equals 1 and each non-occurrence equals zero. 

For a sample, we have: 

Number Proportion 

Occurrences a p 

Non-occurrences jq 

Total N 1.0 

It is obvious that a == iVp and b ^ Nq. 

Since an occurrence equals 1 and a non-occurrence equals zero, v e have 

T + KO) _ a _ 

N AT 

and it follows that o-jc =■ ffp = 


(T 

Vn 




DEMONSTRATIONS 


815 


To obtain an expression for c, we use the following population symbols : 

Number Proportion 

Occurrences a it , 

Non-occurrencos r 

Total 5 1.0 


It is clear that r = - and r = 

(P (P 

Again, each occurrence equals 1 and each non-occurrence equals zero, 
so that 


laCiy + 

'a(l) +|3(0)1= 

J (P 

(P 


^ ~ (^) = - tt), 

= vW. 


We may now write 


~ Vn ~ Vn ~ "'N 


Since a = iVp, we may also write 


aa == N(Tp^— W 





Section 26.1 


To prove that 

^ - - 4(H 

S[N.(Xe - Z)=] = S 


(SX )2 

A 


The expression on the left says: “For each column, square the deviation 
of the column mean from the grand mean, multiply by the number of 
items in the column, and sum these products for all columns.” 

i[Xc(^= - - 2Xlc + ^0], 

1 1 

- - 2NcIIc + JVcI*), 

1 

k« ke kit 

= SiNJl) - 2ZS(N«Z.) + 2(iV.Z^). 

! I 1 
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To prove that 

k, 

S 

1 


Section 26.2 



The expression on the left says: “For each column, total the squared 
deviations from the mean of that column and sum these totals for all 
columns.” 
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Section 26.3 


To prove that 


4 


r\N - 2) lb^2x^(N - 2) 


4 


/ _ 

V^CiV, - 2) __ / ^ ^ 


Since 6 


r 

'Lxy (2xy) 


Si/* 



(^xyy 

Sx* 


(A - 2) 


22/^ 


Sx* 


Sx* 


6*Sx*, and 

J r*(A - 2) ^ ^b^Sx^N - 2) 


22/.^ 


Section 26,4 


To prove that ^ F for coefficients of partial correlation. That isj 
that 

^lm.23 • * • (^^cX.234 « » » m ^^el.234 > ♦ • (m-p) {N — W) 


' lm.23 • • • (w— 1) 


Since rL.23...(m_i) 


2x® 


cl.234* * • m 


2x." 


2xf — SXgi.234 ---m 

■cl.234«»»(m~-l) 


So?! — ^^cl,2S4---(m^l) 


we may write 


’ lm.23 * * * (m- 


.-lAN - m) 


1 — ^1^.23... 


(m-1) 




234* 


24i. 


234‘**(tn-X) 


2x! - Sx,\. 


(A - m) 


234 * ♦ ■ (m— 1) 


2x? - 2xl^. 


234**«(m~X) 


;y«2 

^*^cl.234’ • -wt 


2xf,. 


234 • • • (TO— 1> 


2x f— Sirci.234** • (TO— 1) 2 IZci. 234** - (wi—l) 

Sa?i — 2:5^1,234“* 1 


(SaJci 234---TO ”” ^^cX.234 * ■ * (to—X)) 



.APPENDIX T 


Rounding Numbers* 


Terminology 

Original data result from measurements (which can never be exact) or 
from counting. Measurements wiU therefore always be rounded; counts 
may be rounded, A number which is the result of rouiiding always 
represents a range of possible values rather than a single value. Thus, if 
such a number is recorded as 78 pounds, we know that the true value is 
not lower than 77.5 pounds nor Mgher than 78.5 pounds. 

A digit is significant if the error in the next position to the right does 
not exceed ±5. Thus, if a measurement is recorded as 172.3 pounds, we 
assume that the correct value does not lie beyond the limits of 172.3 ± 
0.05, or 172.8S-pounds and 172.35 pounds, and there are four significant 
digits. It sornbtimes difficult to sEScertain the number of significant 
digits, evei](in an enumeration.*. Thus, it is extremely unlikely that there, 
were exactly'i-50y697,3oiDerso^ the United States on April 1, 1950, 
as reported by the l?ureau^f the Sensus. 

Below are given three illustration of correct terminology for measure- 
ments that have been accurately ijaade and properly recorded, or for 
rounded enumerations: 

127.34 is said to contain five signific^t digits. It has been rounded 
to five significant digits, or to two signifi^nt decimal places, 

4,126 thousand or 4.125 million or 4,125 X 10® or 4,125,ooo, is signifi- 
cant to four digits. If occurring in a table, Wually 4, 125 is recorded, with 
a prefatory note or column heading specifying thousands. The number 
of significant digits in 4,125,000 is ambiguous, since it may range from 
four to seven. The context, however, ofm indicates the number of 
significant digits. There is no ambiguity if ^number ends in zero after 


* This discussion of rounding numbers is from F, E. Croxton and D. J. Cowden, 
Practical Bmmms Etatutics, Second Edition, Prentice-Hall, Inc., New York, 1948, 
pp. 503-506. 
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a decimal point. Thus 4,125.0 and 4.1250 tach have five significant 
digits. 

0. 00031 contains tw A rather than five significant digits (though 0.10031 
contains five and 1.00034ye6ntainE«^ This is because the choice of a 
unit of measurement is arbitrary, fV instance, 0.031 meters is also 31 
millimeters. The importance of this co^ept will be apparent when rules 
for multiplying and dividing rounded nu^:)erS a;fe"^en. 

Rules for Rhundii;ig 

1. If the leftmost of the digits discarded is less than 5, the precediilg-^^ 
digit is not affected. Thus 113.746 becomes 113J when rounded to four 
digits. 

S. If the leftmost of the digits discarded is greater than 5, or is 5 fol- 
lowed by digits not all of which are zero if carried out to a sufficient num- 
ber of digits, the preceding digit is increased by one. Thus, 129,673 
becomes 129.7 when rounded to four digits. Also, 87.2500001 becomes 
87,3 when rounded to three digits. 

3. If the leftmost of the digits discarded is 5, followed by zeros, the pre- 
ceding digit is increased by one if it is odd, and left unchanged if it is 
even. The number is thus rounded in such a manner that the last digit 
retained is even. For example, 103.55 becomes 103.6 and 103.45 becomes 
103.4 when rounded to four digits. (However, 103.5499 becomes 103.5 
as explained in paragraph 1, and 103.4501 becomes 103.5 as explained in 
paragraph 2.) This rule is adopted in order to avoid the cumulation of 
errors in summations, which could result if the preceding digit were always 
raised or always left unchanged. The rule (making the last digit even) 
is more generally used than its reverse (making the last digit odd). It 
is more convenientb than alternately adding and dr<mping the half, since 
one is spared the trouble of remembering which done last. 

Products and QuotientSrObtained tvjafin. Rounded Numbers 

1. In^multipHcation (including squaring), division, or extraction of 
square root, one should not record as d. final answer more digits than there 
are in the original number with the fewest significant digits.^ The follow- 


^ In special circumstances an exception may be made to this rule, provided the 
number of digits that are significant in the answer is clearly indicated* 

Where several computations involving multiplication, division, or extracting a 
square root are involved in working with one set of data, it is sometimes advisable 
to record me more digit in intermediate computations than there are in the original 
number with the fewest significant digits. Sometimes more than one nonsignificant 
digit may be desirable. In this volume we have sometimes carried more than one 
nonsignificant digit in order to obtain a formal check on the accuracy of our com- 
putations. While the extra digits may not be absolutely* accurate, they are suffi- 
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ing illustrations thus indicate the maximum number of digits which it is 
good practice to record: 


358 X 412 
14 X 427 
3,194 X 25 X 427 

4,831 X 0.00412 
5,673 X 8 (exactly) 
25 4- 23 
42.7 4 - 52 
52 4 - 42.7 
•S/O.354 


= 147 thousand. 

“ 6.0 thousand. 

= 34 million. 

- 19.9 

== 45.38 thousand. 
« 1.1 
« 0.82 
« 1.2 
= 0.595 


In the above illustrations the maximum number of digits that may be 
significant is recorded; in some instances the number significant will be 
fewer than the number recorded.^ 

2. If a given number of significant digits is required in the final answer, 
each of the original numbers and each of the intermediate results should 
have one more significant digit than the number of digits required in the 
answer. If any of the original data contain more digits than called for 
by this rule, the excess digits may be rounded off. Thus, if three digits 
are required in the final answer, we may proceed as follows: 


1 (2.7608)5 

(2.761)5 

17.623 

1 (13. 195) (0.87367) “■ 

(13.20) (0.8737^ ^4 

11.53 


= ^0.6611 = 0.813. 


As is almost always the case, the final answer is the same as if we had 
retained all of the original digits and also one more digit in each inter- 
mediate step: 


4 


(2.7608)5 


(13.195) (0.87367) M 11.528 


- 4 ^ 


6220 


Vo-Oein = 0.813. 


The rounding of the original data ia justified because of the small 


eiently close to contribute something to iiie final answer. For instance, if we want 
three digits in our ♦final and have (4.137 X 0.684) (0.316 X 7.831) we 

would employ 2.830 2.475 « 1.14 rather than*2.83 2,47 « 1,15. 

* In the case of the seventh illustration there is, strictly speaking, only one signifi- 
cant digit in the answer. Remembering that a rounded number recorded as 42,7 
may vary between 42,65 and 42,75, while one recorded as 52 may vary between 51,5 
and 52.5, we may compute:' 

42,75 4- 51,5 .830 to three digits, the largest possible result; 

42.7 52 sa .821 to three digits; 

42.65 ^ 52.5 «* .8X2 to three digits, the smallest possible result* 

Since .830 and^812 are not included within .821 ± .005, it is apparent that the 
Becond digit in .821 is not significant. 
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probability that most of the numbers involved will be in error close to 
the maximum possible amount, and "the large probability that there will 
be considerable offsetting of errors. 

3. When the correct product or quotient is known in advance, it should 
be recorded rather than the approximate product or quotient resulting 
from use of the rounded original numbers'. Thus, although 0.125 X 

0.333 = 0.0416, if it is known that the actual operation is i X -s = ^ = 
0.0417, the answer should be recorded as 0.0417 rather than 0^0416. 

Sums and Differences Obtained from Rounded Numbers 

Rules for addition and subtraction substantially parallel those for 
mpRiplication and division, except that it is the number of significant 
decimal places, rather than the number of significant digits, that must be 
considered. 

1. In addition or subtraction, one should never record as a final answer 
more decimal places than there are in the original number with the fewest 
significant decimal places. The following illustrations thus indicate the 
maximum number of digits which it is good practice to record: . 

2,156.2 + 39 = 2,195. 

2,156.2 - 39 = 2,117. 

13 + 12 = 25. 

13 - 12 = 1. 

In the above illustrations the*maximum number of significant decimal 
places is recorded; in some instances the number significant will be fewer 
than the number recorded.® 

2. If a given number of significant decimal places is required in the 
final answer, it is desirable that each of the original numbers have one 
more significant decimal place than the number of decimal places required 
in the answer. If any of the original data contain more digits than called 
for by this rule, the excess digits may be rounded off. Thus, if no decimal 
place (no digit to the right of the decimal point) is required in the final 
answer, we may proceed as follows: 


122.34 ] 

1 [ 122.3 

81.7 

^may be rounded to| 81.7 

293.826) 

1 (293.8 

497.866 

497.8, 


both of which round to 498. 


“ If the student will check the last two results by a procedure aimiUr to that 
described in footnote 2, he will find that the last digit recorded is not significant, since 
the limits of error are ± 1.0, instead of the permissible ±0.'5. 
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The roUading of the original data is justified because of the small proba- 
bility that "most ot the" numbers involved will be in error close to the 
maximum possible amount, and the large probability that there will be 
considerable offsetting of errors. 

3. When the correct total is known in advance, it should be recorded,, 
rather than the approximate total resulting from addition of the rounded 
numbers. Thus: 




Thousands 

Per cent 


Dollars 

of 

of 



dollars 

totaP 


507,334 

507.3 

66.67 


126,832 

126.8 

16.67 


126,834 

126.8 

16,67 

Total of recorded numbers 

... 761,000 

760.9 

100. oi 

Benord the total known to be correct 

... 761,000 

761.0 

100.00 


* Computed from column 1. Total would not be exactly 100, even if 7 digits were recorded for 
eacb percentage. 
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Selected List of Readily Available 
Sources of Data- 


For each source, the current title, frequency of appearance, and 
issuing organization are given. Many of the sources have had titl^ 
diifferent from those shown, have appeared more or less frequently than 
at present, or have been released by different organizations or by the 
same organization under a different name. For such changes, see the 
introductory paragraphs in the sources. 

A. GENERAL 

Statistical data from more than one field will be found in these 
publications of a general nature. 

1, An Almanack (also known as Whitaker ^ b Almanac), Annual. 

Joseph Whitaker, London. 

2, County and City Data Bookf 1962, One previous issue, dated 1949. 

There is also a County Data Book, dated 1947. Bureau of the 
Census. 

3. Dutribuiion DaUi Guide, Monthly. Department of Commerce. 

4. The Economic Almanac, Annual. Published by Thomas Y. Crowell 

Company, New York, for the National Industrial Conference 
Board. 

6, Economic Indicators, „ Monthly. An historical and descriptive 
supplement was issued December 1953. Joint Committee [of 
Congress] on the Economic Report. 

6. Federal Reserve Bulletin. Monthly. Board of Governors of the 

Federal Reserve System. 

7. The Handbook of Bcmc Economic Statistics, Monthly, Economic 

Statistics Bureau of Washington, D. 0. (A private organiza- 
tion.) 

8. Historical Statistics of the United States 1789-1945 and Continuatim 

to 1952 of Historical Statistics of the United States, Both are 

823 
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supplements t# the Statistical Abstract of the United States. 
Bureau ©f thie Census. 

9. Monthly Bulletin of Statistics. Statistical Office of the United 
Nations, New York. 

10. Standard and Poores Trade and Securities Statistics. Current Statistics^ 
issued monthly, contains cumulative data available since the 
previous issue of Current Statistics Combined with Basic Statistics, 
JThis latter publication supplements the eleven basic statistics 
•pamphlets ou various topics and the 1952 edition of Security 
Price Index Record. Standard and Poor’s Corporation, New 
York. 

IL The Statesman's Yearbook. Macmillan and Company, Limited, 
London. 

12. Statistical Abstract of the United States. Annual. Bureau of the 

Census, 

13. Statistical Yearbook. Statistical Office of the United Nations,* New 

York. 

14. Survey of Current Business. Monthly with weekly supplements. 

Biennial supplements entitled Business Statistics are also issued. 
Office of Business Economics of the Department of Commerce. 

15. The World Almanac and Book of Facts. Annual. New York 

World-Telegram and The Sun. 

Periodicals such as: 

16. Barrens. Weekly. Barrens Publishing Company, New York, 

17. Business Week. McGraw-Hill Publishing Company, New York, 

18. The Magazine of Wall Street. Bi-weekly. The Ticker Publishing 

Company, New York. 

Daily newspapers. 

B* COMMODITIES— PRICES, PRODUCTION, 
CONSUMPTION, STOCKS, EXPORTS, AND IMPORTS 

L Agricultural Prices. Monthly. Agricultural Marketing Service. 

2. Agricultural Situation. Monthly. Agricultural Marketing Service. 

3. Agricultural Statistics. Annual. Before 1935, statistical material 

was in the Yearbook of Agriculture. Department of Agriculture. 

4. Annual Survey of Manufactures. Bureau of the Census, 

5. Census of Agriculture. Quinquennial since 1920, decennial 1840- 

1920. Bureau of the Census. 

6. Census of Business. Latest, 1948; previous censuses, 1929, 1933, 

1935, and 1939. Data for 1954 collected in 1955. Bureau of 
the Census.- 
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7. Census of Manufactures. Latest, 1947; none taken 1940-1946; 

biennial 1921-1939, quinquennial 1904-19 19,- decennial (1829 
omitted) 1809-1899. Data for 1954 collected in 1955. Bureau 
of the Census. 

8. Census of Mines and Quarries. Latest, 1939; approximately decen- 

nial 1840-1939. Data for 1954 collected in 1955. Bureau of 
the Census. 

9. Commodity Yearbook. Not published 1943-1947. Commodity Re- 

search Bureau, Inc., New York. ^ 

10. Consumer Price Index. Monthly. Bureau of Labor Statistics. 

11. Crops and Markets. Annual. Agricultural Marketing Service. 

12. ^Daily Index Numbers and Spot Primary Market Prices. Weekly. 

Daily data available but no daily mailings. Bureau of*Labor 
Statistics. 

13. Foreign Commerce Weekly. Bureau of Foreign Commerce. 

14. Foreign Trade Reports. Monthly and annual. Bureau of the 

Census., 

15. Minerals Yearbook. Bureau of Mines. 

16. Monthly Bulletin of Agricultural Economics and Statistics. Food and 

Agriculture Organization of the United Nations. Rome, Italy 

17. Monthly Labor Review. Bureau of Labor Statistics. 

18. Monthly Retail Trade Report. Bureau of the Census. 

19. Monthly Wholesale Trade Report^ Sales and Inventories. Bureau of 

the Census. 

20. Quarterly Summary of Foreign Commerce of the United States. Bureau 

of the Census. 

21. Retail Food Prices by Cities. Monthly. Bureau of Labor Statistics. 

22. Retail Prices and Indexes of Fuels and Electricity. Monthly. Bureau 

of Labor Statistics. 

23. Sales Management Survey of Buying Power. Annual. Sales Man- 

agement [Magazine], New York. 

24. Wholesale Price Index [Monthly], Prices and Price Relatives for 

Individual Commodities. Monthly . Bureau of Labor Statistics. 

25. Wholesale Price Index [Weekly] and Percent Change in Spot Market 

Indexes and For Selected Commodities. Weekly, Bureau of 
Labor Statistics. 

26. Wholesale (Primary Market) Price Index. Monthly, Bureati of 

Labor Statistics. 

Special studies of the various services and divisions of the Department of 
Agriculture, of the Bureau of Labor Statistics, and of state agricul- 
tural experiment stations. 
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C- .FINANCIA^L^MONEY, BANKING, SECURITIES, 

interest rates, taxation, etc. 

L Annual Report of the Board of Governors of the Federal Reserve System 

2. Annual Report of the Comptroller of the Currency, 

3. Annual Report of the Federal Deposit Insurance Corporation, 

4. Annual Report of the Secretary of the Treasury on the State of the 

Finances, 

6. AuWmI Report of the Securities and Exchange Commission, 

6. Annual reports of state banking departments. 

7. Assets and Liabilities of Operating Insured Banks, Semiannual. 

Federal Deposit Insurance Corporation. 

8» BiMetin of the Treasury Department, Monthly. Department oFthe 
Treasury. 

9. The Commercial and Financial Chronicle, Semiweekly. William B. 
Dana Co., New York. 

10. Daily Statements of the United States Treasury, Daily and semi- 

monthly. Department of the Treasury. 

11. Duh^s Statistical Review, Monthly. Dun and Bradstreet, Inc., 
, ' New York. 

12. Federal Reserve Charts on Bank Credit^ Money Rates^ and Business, 

Monthly with annual supplements. Board of Governors of the 
Federal Reserve System. 

13. Income Distribution in the United States, Data for 1950, 1947, 1946, 

and 1944. OfiSce of Business Economics. 

14. International Financial Statistics, Monthly. International Mone- 

tary Fund, Washington, D. C. 

15. National Income and Product in the United States, The 1954 edition 

replaces the 1951 edition. Office of Business Economics. 

16. Statistical Bulletin, Monthly. Securities and Exchange Com- 

mission. 

17. Statistics of Income, Annual. Internal Revenue Service. 

Bulletins of the individual Federal Reserve Banks. 

Bulletins of various large banks. 

Data concerning city and state finances are to be found in reports issued 
from time to time by the Bureau of the. Census. 

D. EMPLOYMENT, WAGES, AND HOURS OP LABOR 

1. EmpUymmt and Earnings, Monthly. Bureau of Labor Statistics. 

2. The Labor Market and Employment Security, Monthly. Bureau of 

Employment Security. 



SOURCES OF DATA 


827 


3. Monthly Labor Review. Bureau of Labor Statistics. 

4. Monthly Report on the Labor Force. ' A Current Population Report. 

Bureau of the Census. 

5. Yearbook of Labour Statistics. International Labour Office. Geneva^ 

Bulletins of state bureaus of labor or industrial commissions. 

Special bulletins of the Bureau of Labor Statistics and of the Women^s 
Bureau. 

E. ACTIVITIES OF INDIVIDUAL CONCERNS 

1. Besfs Insurance Reports (fire and casualty) and Besfs Life Insurance 

Reports. Annual. Alfred M. Best Company, New York. 

2. F%^h Bond Record. Weekly. The Fitch Publishing Company, New 

York. 

3. Fitch Individual Bond Bulletins. Listed and unlisted bonds. Four 

each week. The Fitch Publishing Company, New York. 

4. Fitch Individual Stock Bulletins. Listed stocks. Five each week. 

The Fitch Publishing Company, New York. 

5. Fitch Stock Record. Monthly. The Fitch Publishing Company, 

New York. 

6. Fitch Unlisted Securities Service. Unlisted stocks Four each week. - 

The Fitch Publishing Company, New York. 

7. Media Records. Newspapers and newspaper advertisers. Monthly, 

quarterly, and annual; also special reports. Media Records, 
Inc., New York. 

8. Moody^s Bond Survey. Weekly. Moody ^s Investors Service, New 

York. 

9. Moody’s Manual of Investments. Five volumes: industrials; rail- 

roads; public^ utilities; governments and municipals; banks, 
insurance, real estate, and investment trusts. Annual with 
semiweekly bulletins. Moody’s Investors Service, New York. 

10. Moody’s Stock Survey. Weekly. Moody Investors Service, New 

York. 

11. Security Owners Stock Guide. Monthly and year-end. Standard 

and Poor’s Corporation, ^New York. 

12.. The Spectator Insurance Year Book. Two volumes: life; fire and 
marine, casualty, and surety. Annual. The Spectator Com- 
pany, Philadelphia. 

13« Standard Corporation Records. Daily dividend section with weekly, 
monthly, and annual cumulations; daily news section with par- 
tial cumulations each month; descriptions of corporations con- 
tinuously revised resulting in complete revision each yean 
Standard and Poor’s Corporation, New York; 
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Reports of state insurance commissioners. 

Annual reports of corporations to their stockholders. 

F. MISCEI.LANEOUS 

1. Annual Report of the Commissioner of Internal Revenue* 

2. Annual Report of the Immigration and Naturalization Service, 

3. Automobile Facts and Figures, Annual. Automobile Manufac- 

^^turers Association, Detroit. 

4. Cdims of Housing, 1950 and 1940. Bureau of the Census. 

5. Ce 7 isus of Population, Decennial. Bureau of the Census. 

6. Construction Review, Monthly. Bureau of Labor Statistics and the 

Building Materials and Construction Division of the Dejpart- 
ment of Commerce. 

7. Current Population Reports, Deal with labor force (monthly, see 

reference D-4), population estimates, population character- 
istics, special population censuses, and consumer income. 
Intervals of issue vary. Bureau of the Census. 

8. Demographic Yearbook of the United Nations. New York. 

9. Dodge Statistical Research Service. Construction data. Monthly. 

F. W. Dodge Corporation, New York. 

10. Electric Power Statistics, Monthly. Federal Power Commission. 

11. Highway Statistics, Annual Bureau of Public Roads. 

12. Life Insurance Fact Book, Annual. Institute of Life Insurance, 

New York. 

13. Monthly Review. Railroad Retirement Board. 

14. Monthly Survey of Life Insurance Sales in the United States and 

Canada, Life Insurance Agency Management Association, 
Hartford. 

16. Monthly Vital Statistics Report, National Office of Vital Statistics. 

16. Motor Truck Facts. Annual. Automobile Manufacturers Associ- 

ation, Detroit. 

17. Municipal Yearbook, International City Managers Association, 

Chicago. 

18. Public Health Reports. Monthly, public Health Service. 

19. Social Security Bulletin, Monthly. Social Security Board. 

20. Statistical Handbook of Civil Aviation, Annual with quarterly supple- 

ments, Civil Aeronautics Administration. 

21. Statistics of Railways in the United States, Annual Interstate Com- 

merce Commission. 

22. Statistics of the Communications Industry in the United States, 

Annual. Federal Communications Commission, 
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23. Vital Statistics of the United States. Annual. National Office of 

Vital Statistics. 

24. A Yearbook of Railroad Information. Eastern Railroad Presidents' 

Conference, New York. 

Bulletins of university bureaus of social, economic, and business research. 

Monographs and special studies of the Bureau of the Census, the Bureau 
of Foreign Commerce, the Office of Business Economics, the Bureau 
of Labor Statistics, the Office of Education, the Agricultural Market- 
ing Service, and numerous other governmental offices, bureau^, com- 
missions, and boards. 

Statistical information concerning specific industries may be had from 
^trade papers and trade associations. 

A list of sources of data is given on pp. 306-307 of Business Siatistfcs for 
1953. See reference A~14, above. 
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A 

Adler, F,j 689n 

Aggregative price index numbers: 
simple, 405-406 
weighted, 406-413 

approximate weights, 412-413 
average quantities, 410 
base-period quantities, 409 
given-year quantities, 409-410 
group weights, 417-418 
highest common factor, 410-411 
‘^ideal,” 411-412 
Marshall-Edgeworth, 410 
Aggregative quantity index numbers, 421- 
423 

Agricultural Marketing Service Indexes, 
439-441 

Alienation, coefficient of, 463n 
Alphas, 231, 234, 239, 619-622 
American Institute of Public Opinion, 
sampling method of, 32 
A, T. and T. Index of Industrial Activity, 
444 

Amplitude ratio, 361 
moving, 362 

Analysis of variance (sec oho Variance and 
Variation) : 

described, 706, 709-711 
one criterion of classification, 708-711 
test of seasonal index, 339 
two criteria of classification; 
one entry in a box, 711-714 
several entries in a box, 711-714 
used in correlation: 

multiple correlation, 733-734 
non-linear correlation, 728-732 
partial correlation, 735 
two- variable linear correlation, 723n 
Area sample, 29 
Arithmetic mean: 

behavior of, from samples, 626-634 
comparison of several from samples (see 
Analysis of variance) 
confidence limits of, 648-650, 653-654 
definition, 173 

dispersion of, from samples, 632-634 
graphic location, frequency curve, 192- 
193 

kurtosis of, from samples, 629-631 
mean of, from samples, 627 
modified forms, 1S2-183, 322-323, 335- 
339 

of averages, 184-185 


Arithmetic mean (cont,)t 
of grouped data; 

long method, 176-179 
open-end classes, 182 
short methods, 179-181 
unequal class intervals, 181-132 
of percentages, 151—152, 183-184, 680 
of ungrouped data, 173-174 
properties of, 174-176 
significance tests of difference between: 
sample mean and population mean, 
635-650 

two sample means, 651-657 
skewness of, from samples, 627-629 • 
standard error of, from samples, 632- 
633, 807-810 

Arithmetic mean, median, and mode, 
characteristics of: 
algebraic treatment, 193 
extreme values, effect of, 195-196 
famEiarity of, 192 
graphic location of, 193 
irregularity of data, effect of, 196 
mathematical properties of, 197 
need for classifying data, 193-194 
open-end classes, effect of, 194-195 
reliability of, 197 

selection of appropriate measure, 197- 
198 

skewness, effect of, 195 
unequal class intervals, effect of, 194 
Arithmetic probability paper, 607 
Arithmetic progression, 93-94, 103, 104 
Arrangement in tables: 
alphabetical, 58 
customary, 59 
geographical, 58 
historical, 59 
magnitude, 59 
numerical, 60 
progressive, 69—60 
Array, 154-156 

Asymmetrical curve {see Skewed curve) 
Asymmetry {see Skewness) 

Asymptotic growth curves {see Modified 
exponential; Gompcrtis; Logistic) 
Average {see Central tendency) 

Average deviation, 215 
Average-of-reiatives index number (see 
Index numbers) 

Axes, for curves, 68-71 
Ayres* Index of State School Systems, 
396 

Ayres, L, 396 
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B 

Bar chart: 

comparea with simple (fPrve, 122-123 
complex types, 121-123 
component-part, .126-130 
frequency distribution column diagram, 
73-74 

simple, 119-120 
Barton, H. C., 330n, 350n 
Base line, 80 
Betas: 

coefficients in correlation, 557-558 
measures of skewness and kurtosis, 229- 
23 ^ 19-622 

signifie^ce of measures of skewness and 
kurtosis, 720-722 
tables, 764-765 
Bias, 7 

in a sample, 31, 33-34 
Biased estimate of 0 - 2 , 644 
Bi-mocJ^lity, 191-192 
Binomial: 

and normal curve, 591-594 
fitting of, 607-613 

used with sample proportions, 661-663, 
667-672, 675-679 
Birth rates, 144 
Brinton, W. C., 76n 
Bruce, D., 457, 488 
Brumbaugh, M. A., 388n 
Burns, A* 392n, 579n 
Business activity, indexes of, 442-446 
Business activity in Pittsburgh, index of, 

. •* 445-446 

Business cycles (see Cyclical movements) 

C 

Calendar, flexible, of working days, 738- 
739 

Calendar variation, adiustment for, 253- 
256, 326-328 
Campbell, N. R„ 591n 
Camp-Meidell inequality, 221 
Card punch, 42-43 

Causation confused with association, 9-10, 
469-470 

Central tendency, measures of (see also 
Arithmetic mean, Geometric mean. 
Harmonic mean, Median, and 
Mode) : 

arithmetic mean, 173-185, 415 
comparison of arithmetic mean, geo- 
metric mean, and harmonic mean. 
199-201, 204-209, 793-794 
comparison of arithmetic mean, median, 
and mode, 192-198 
geometric mean, 198-203, 418-420 
harmonic mean, 203-209, 420, 430 
median, 185-187 
mode. 189-191 

modified mean, 182-183, 322-323. 335-339 
quadratic mean, 209 
Chaddock, R. E., 150n, 480n 
Chain index: 

advantages and disadvantages, 431-432 
description. 431 
illustration. 431 
Changing seasonal, 247-248 
progressive, 340-^351 
sudden, 351-362 


Chart construction, rules for (see specific 
type of chart) 

Chart proportions, 81-83 
Charts, types of (see also specific types of 
charts), 68-69 ^ 

Chebycheff’s (Tchebycheff) inequality. 221 
Chi-square: 

alternative exact methods, 683, 686-689 
curves of, 682 

degrees of freedom for, 681, 685-686, 690- 
691, 699 

distribution of, 683 

relation to coefficient of mean square 
contingency, 48 In 

relation to normal, t, and F distributions 
720-721 

table of values, 762-753 
used as “goodness of fit** test, 690-691 
used to obtain confidence limits of <r*. 
701-702 ^ 

used to test significance of 5*® or 699- 
701 

used with 1X2 tables, 681-683 
used with 1 X R tables, 689-691 
used with 2X2 tables, 684-686 
used with 2X3 and larger tables, 691 
693 

used with variances, 699-702 
when same as p — tt test, 681-683 
when same as pj — p 2 test, 684-686 
Circle diagrams, 126-130 
Classification : 
bases of, 3-6 
chronological, 4 
concealed, 11 
geographical, 4-5 
qualitative, 3 
quantitative, 3-4 
Clopper, C. J.. 678-679 
Cluster sample, 28-29 
Cochran, W. G., 30n, 31n, 691 
C'^efficient of: 
alienation, 463n 

correlation (see Determination, coeffi- 
cient of) 

determination (see Determination, coeffi- 
cient of) 

kurtosis, 232-236 
likelihood (see II) 

mean square contingency, 481-482 
net estimation, 533 
non-determination, 463 
separate determination, 558-659 
similarity, 577n 
skewness, 226-232 
variation, 222-225 
Collection of data: 
general plan, 17 
methods: 

enumeration, 16, 34-35 
mail, 16, 18, 35-36 
registration, 16 
procedure outlined, 16 
sample, selection of, 25-34 
schedule: 
editing, 36-37 
making, 18-26 ■ 
organizing data from, 37-46 
use of, 34—36 

Commodity Prices, Wholesale, Index of, 
398-400. 438-439 
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Common logarithms: 
explanation, 776 
table of, 777-791 

Common Stock Prices, Index of, 441-442 
Component-part charts: 
bar charts, 126-130 
line diagrams, 90-92 
pie diagrams, 126-130 
Compound interest curve, 95n, 202, 290-294 
Confidence limits of: 

arithmetic means, 648-650, 653-654 
coefficients of determination, two-vari- 
able linear, 725 

correlation coefficients, two-vanabie 
linear, 725 

proportions, 671-679 
standard deviations, 701-702 
variances, 701-702 
Consumer Price Index, 394, 436-438 
Contingency, coefficient of mean square, 
481-482 

Con4muous variable, 161 
Coordinates for charts, 81 
Correlation : 

and averages, 472-473 
and causation, 469-470 
and explained variation, 461-465 
and heterogeneity, 470-472 
and measurement of lag {see Lag) 
coefficient of (see Determination, coeffi- 
cient of) 

effect of grouping, 477-478 
first moment correlation, 577 
meaning of, 451-454 
means, use of (see Correlation ratio) 
multiple (see Multiple correlation) 
non-linear (see Non-linear correlation) 
of time series (see Time series correlation) 
partial (see Partial correlation) 

Pearsonian formula (see product-moment 
formula, below) 

population estimate of coefficient (see 
Population estimate) 
product-moment formula, 465-466 
qualitative distributions, 480-482, 524 
ranked data, 478-480 
reliability of measures, 722-736 
two- variable linear: 
grouped data, 473-478 
ungrouped data, 466-469 
Correlation ratio, 520-524 

estimate of value in population, 732 
limitations of, 524 
signififanee tests, 730-732 
Cosgrove, Jessica, 7n 
Cowden, D. J., 149n, 183n, 4G4n, 470n, 
738n, 81Sn 
Cox, H., 16 

Criterion of fit, general, 262 
equal areas, 2f)2 
in Glover’s method, 29 In 
least squares, 265-275, 796-798 
partial sums, 302, 309, 310 
selected points, 310, 315 
Criterion of likelihood {see L) 

Criterion of significance, choice of, 640-641 
Crow, Carl, 35n 
Crowder, W. F., 23Sn, 616n 
Croxton, F. E., 117n. 126n, 127n, 149n, 
162n, 183n, 328n. 453, 464n, 470n, 
471, 504n, 58Sn, 594n, 666n, 704n. 
738n, 748, 749, 8l8n 


Crum, W. L., 388n 
Curves, for presenting data: 
axes, 68-7 Ic 
base line, 80 

chart proportions, 81-84 
compared wnth bar cljarts, 92, 122-123, 
129-130 
coordinates, 81 
lettering, 84 

of frequency distributions, 73-75, 162- 
n70' 
origin, 70 
quadrants, 68, 70 
ruling, 80-81 
scale labels, 84 
source, 85 
title, 84-85 ® 

use of vertical scale break, 77 
zero on vertical scale, 76-80 
Curve type, selection of, 280, 318—319 
Curvilinear correlation {see Non-hnear 
correlation) 

Cycle chart, 387 
Cyclical movements; 

comparison of, 384-387, 578-585 
correlation of, 578-585 
explained, 249-251 
methods of isolating: 
direct, 388 

harmonic analysis, 388 
reference-cycle analysis, 388-392 
residual, 367, 373-382 
specific-cycle analysis, 392 

D 

Data, statistical {see also Index numbers, 
data for) : 
analysis of, 3-6 
classification of, 3-5 
collection of, 2-3, 16-45 
comparability of, 48-49 
insufficient, 10 
interpretation of, 6 
meaning of, 1 
period data, 71-72 
point data, 71-73 
presentation of: 

by charts, 67-135 (see also Charts) 
by semi-tabular device, 51-52 
by tables, 51-56 (see also Tables, 
statistical) 
by text, 50-51 
sources of, 45-49, 823-829 
tabulation of, 37-45 
Davies, G. H., 238n, 577n, 616, 619 
Death rates, 143-144 
Deciles, 187-189 
Deflating, 257-258, 394 
Degrees of freedom for: 
analysis of variance, 709, 713, 718 
chi-square tables, 081, 685-686, 690-091 
tests of correlation measures: 
multiple, 733 
non-linear, 727-732 
partial, 727, 734-735 
two-variable linear, 723 
tests of differences boiween: 

means of two independent samples, 
653 

means of two non-independent sarm 
pies, 666 
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Degrees of freedom fswXcont ) : 
test^ of differeruses between (cont ) : 
sample mean and population mean, 
645* 

sample vai^iance*and populatiorf 
variance, 700 

seyeral means (see analysis of variance, 
above) 

two sample variances (see also analysis 
of Variance, above), 702 
Moiyre,^., 590 

g emonatrMions of formulas, 792-817 
ensities (see Frequency densities) 
Dep^dent variable (see Variable) 
Defi^mination, coefficient of: 
nfultiple-, 

' effect; of additional variables on^^^4 
fotif^ or more independent variates, 
550, 556 

significance tests, 732-734 
three independent variaMes, 548^^56 
two independent vari^les, 534, /^3, 
546n, 555-556 
multiple-partial, 551 
non-linear: 

second-degree curve, 490^^1 
significanc^ests, 726-7^ 
straight Ipfe to logarit^jhms, 511, 619 
straigh^ne to recipjpOcals, 520 
straight line to sqi^re roots, 614 
f degree curv^ 497 

1 535, 543-645, 662-564 

:9, 654-555 
s,/734-736 
>rder, 550-551, 656 
r| 461-466, 468-469, 

4 725 
I, 722-726 

Icient of, estimate of 
lue (see Population, 

I estimates of) I 

Determination, coefficients of separate, 

;/ 558-659 ^ 

Diagram (see specific t^pe of chart; 

r Scatter,„diagrara) 

Di^^rete variable, 161 * 

Dispersion: 
a^iolute, 213-222 
_ ;aphic illustratiori 212 
relative, 222-225 I 
graphic illustration,! 224 
DoolittleVM. H., 498 I 
Doolittle iMthod, 498-5©3, 549n 
Double logarithmic papet (see Logarithmic 
V chart) \ ^ 

EWjle, R. P., teO, 702 
Duncan, A. J.,|641n 


E 

Killer, adjl^^ent for, 35b“359 
Eaton, E. l.,^8n 
Edgeworth, F. jSL 410 
Editmg sehedulies^^-ST 
Edmunds, Harriet, i^Sn 
Elderlon, W, PL filScNv 
Electronic statikical 42-46 

Elmer, W. ipn ' IX,, 

Emphasis, obtauSingmf, in tapi^svJfi 
Entry IST-ISF^" ^ ^ 


Ehpmeration, 16 

Equation type, fitness of, 280>s318-319, 
516-518 

Errors: 

Type I, 639 
Type II, 639, 719n 

Estimated standard error (see Standard 
error, estimated) 

Estimating equation • 
multiple correlation 

four OF more independent^varmfelffl, 

M9 

three independent variables, 646-547. 
648 

two independent variables, 633, 542- 
543, 548-549 

multiple curvilinear correlation, 659-660 
non-linear correlation: 
second-degree curve, 486 
straight line to logarithms, 603-604, 
608-609, 612, 518 

straight line to reciprocals, 606*-o0r), 
519 

straight line to square roots, 504-505, 
513-516 

third-degree curve, 486, 493 
two-variable linear 'correlation: 
grouped data, 477 

ungrouped data, 454-458, 466, 467, 
491, 539 

Estimation, net coefficient of, 533 
Explained variation in: 
multiple correlation: 

four or more independent variables, 
550 

three independent variables, 647 
two independent variables, 534, 543 
non-linear correlation: 
correlation ratio, 521-522 
second-degree curve, 490 
straight line to logarithms, 510-611, 
518 

straight line to reciprocals, 520 
straight line to square roots, 514 
third-degpe curve, 497 
two-variablk linear correlation, 461-464, 
468, 493 539-640 
Exponential curve 
fitting, 290-2 m4 

properties of, 290-291 
modified, 298-102 
properties ofX 298-299 
Ezekiel, M., 659n 


w 


Fi 


curres of, 703 
definition of, 7O2-70S 
distribution of, 703 
inclusive of :^rmal, 
distributions, 720" 


hi-squares and t 


table of vakiea of. 
used : 

in analysis of/varianpe, 709-711, 713- 

^ 4 , 7187720 

in correMop^sts, 723n, 728-735 
to testji^nTncance of differem*e between 
^ estimated variances, 702-704 
#*«'. tabl^ of values of, 747 
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Factor reversal test, 427-428 
Faikner, Helen D., 326ii 
Farm housing in Oklahoma, index of, 
446-447 

Federal Reserve Index of Industrial 
Production, 442-444 
Ferber, R., 30n, 482n 
Ferger, W. F., 420, 430 
Fiducial limits (see Confidence limits) 
Fifth-degree curve (see Polynomial series) 
Finney, I). J., 687n 
First moment correlation, 577n 
First-order partial correlation coefficients, 
643-545, 652-554 
Fisher, A., 613n 

Fisher, I., (see also “Ideal” index num- 
ber), 403n, 411, 412n, 427, 428, 

429 

Fisher,1 R} A., 27n, 289n, 724n. 750, 753, 
759 

FlexiJ)le calendar of working days, 738- 

Foote, R. J., 347n 
Footnotes in tables, 61 
Forecasting, 111-112, 309-310, 316, 670- 
685 

Fourth-degree curve (see Polynomial 
series) 

Fox, K. A., 347n 

Frequency curves (see also Binomials) : 
fitting^ of, 694-623 
graphic comparison of, 166-168 
ogives, 168-170 
plotting of, 73-76, 162-166 
types of: 

bimodal, 191-192 
reverse J, 164 
skewed, 162-164 
symmetrical, 163 
Frequency densities, 92, 164-166 
Frequency distribution: 
classes: 

and method of reporting values, 160- 
162 

locating mid values, 160-162, 177-178 
number and limits, 159-162 
open-end, 165, 194-195 
points of concentration, 161-162 
comparison of frequency distributions; 
different class intervals, 167 
different sample sizes, 166-167 
construction, 166-159" 
cumulative, 168-170 
curves: 

on arithmetic paper, 73-76, 168-170 
on arithmetic probability paper, 607 
on logarithmic probability paper, 616 
using logarithmic horizontal scale, 614 
plotting, 73-76, 164-166, 168-170 
plotting when classes are unequal, 164- 
166 

Frequency distribution and range chart, 

92 

Fuakhauser, H. G., 6Sn 
G 

Gallup, G. H., 32n 
Galton, Sir F., 454n % 

Garfield, F. R., 236 
Gauss, J. K. F*, 590 
Gaussian curve (see Normal curve) 


General table, 53 

Generic differences versus statistical 
' differences^, 667-658 
Gentile, _Marion C., 73£n 
Geometric mean: 

compared with arithmetic mean, 199- 
201, 208-209, 418-420, 704-706, 

793 

compared with harmonic mean, 209, 
793-794 

definition of, 198 
from grouped data, 199, 616 
from ungrouped data, 198-199 
properties of, 198-200 
uses of : 

averaging ratios, 201 
finding rate of change, 201-203 
in index numbers, 411, 418-420 
in skewed distributions, 201, 616 
Geometric progression (see also Compound 
interest curve; Exponential curve): 
logarithms of, plotted, 98 • 

plotted on arithmetic grid, 94 
plotted on semi-logarithmic grid, 99 
properties of, 94-95 
Glover, J. W., 291n 
Gompertz curve, 302-310 
as “law” of growth, 309-310 
charts of characteristic shapes, 303 
comparison with logistic, 316-318 
first differences of, 317 
fitting of, 302-309 
properties of, 302-303 
Gordon, R. A., 263n 
Gram-Charlier series, 619n 
Graphic method, advantages and limita* 
tions of, 67-68 

Graphic presentation (see specific type of 
chart) 

Gressens, 0., 577n 
Grossman, H. A., 692 
Grove, R. D., 144n, 243 
Growth curves, asymptotic (see Modified 
exponential; Gompertz; Logistic) 
GuUford, J. P., 447n, 472n 

H 

Hanseh^ M. H., 28n, 30n 
Haphazard sample, 32-33 
Harding, P. L., 655 
Harmonic analysis of time series, 388 
Harmonic mean: 

compared with arithmetic mean, 204- 
207, 208-209 

compared with geometric mean, 209, 
793-794 

computation of, 203 
definition/of, 203 
properties of, 203 
uses of : 

averaging prices during crop year, 
207-208 

in index numbers, 416n, 420, 430 
in skewed distributions, 207 
numerator-term weights, 204-207 
Hartley, H. 0., 688n, 741, 745, 747, 753, 
759, 764, 765 
Heim, M. H., 216 
Hog-corn ratio, 109-111, 146 
Holmes, B. E., 452 
Hood, W. M., 335" 
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HoteUmg. H., 228n, 724n 
Iliirwitia, W. N., 28n, 30n 
Hypothesis, null (see NuU h'Vpothesis) 

! 

“Ideal” index number: 
criticism of, 411-~4r2 
factor reversal test, 427-428 
formula, 411 
time reversal test, 427 
Improprieties (see also Percentages, faulty 
use of) : 
bias, 7 

carel4if»ffness, 8-9 
concealed classificationf 11 
confusion of association and causation, 
9-10, 469-470 
failure to define units, 11 
insuflScient data, 10 
misleading totals, 11-12 
non^coinparable data, 9 
non-sequitur, 9 

omission of important factor, 7-8 
poorly designed experiment, 12 
unrepresentative data. 10-11 
Independent variable (see Variable) 

Index numbers: 

. aggregative: ^ - 

price (see also Aggregative price index 
numbers), 405-413 
quantity (see also Aggregative quan- 
tity index numbers), 421-423 
average of relatives: 
price, 414-420 
quantity, 423-424 
bases, 404-405 

behavior of relatives, 398-400 
chain, 431-432 
changing weights, 434-436 
comparison of formulas, 420 -421 
contrasted with relative, 396-397 
data for, 401-404 
definition of, 394 
descriptions of: 

Agricultural Marketing Service Indexes 
of Prices Paid by and Received by 
Farmers and Parity Ratio, 439-441 
A. T, and T. Index of Industrial 
Activity, 444 

Bureau of Labor Statistics: Consumer 
Price Index, 436-438; Wholesale 
Commodity Prices, 438-439 
Business Activity in Pittsburgh, 445- 

446 

Farm Housing in Oklahoma, 446- 

447 

Federal Reserve Index of Industrial 
Production, 442-444 
N. y. Times Weekly Index of Busi- 
ness Activity, 444- 445 
S. E. C. Index of Common Stock 
Price*, 441-442 
mathcinaticnl tests, 426-428 
price, 405-421 
problems, 397-398 
quantity, 42 H 424 

substituting, adding, or dr<»pping com- 
modities, 432-436 
uses of, 394-396 

weighting schemes, 40B-413, 415-417 


Industrial activityj index of, 444 
Industrial production, index of, 442-444 
‘inference, statistical (.sec Significance 
tests; Confidence limits) 
Inspection trend, 262, 318, 341-347 
Irregular variations: 
computation of, 382-384 
curves of, 383-384 
explainad, 251 
smoothing of, 380-382 

j 

Jahoda, Mane, 13n 
Jesbop, W. N., 46Gn 

K 

Kana, A. J„ 631-632 

Karpinos, B. D., 692 

Kelley, T. L., 553n 

Kendall, M. G-, 480n, 594n, 63Sn. 

Keynes, J. M., 408n, 411, 429 

Key punch (see Card punch) 

King, W. I., 194, 42Sn 
Koffsky, N. M., 439n 
Kurtosis: 

graphic illustrations of, 213, 233 -234 
measure of, 232-236 
significance test, 722 
Kurtz, E. B., 236 

L 

L: 

description, 705 
table of values of, 763 
used to compare several variances, 704* 
706 

Lacey, 0. L., i2n 
Lag, mea.suiemont of, 579-5H5 
* Use in forecasting, 582-585 
Laspeyres, E., 409 
Latscha, R., 687n 
Latter, 0. H., 707 
Lead, measurement of, 579-585 
use in forecasting, 682-585 
Least squares, 205-270, 796*799 
Leptokurtic distrebutions, 213, 233 235, 
382, 722 

Lettering of charts, 84 
Lev, J., 30n, 026n, 638n, 720n 
Lewis, R, E., 352n 
Lewis, T., 753 

Likelihood, criterion of (see L) 

Linder, F. E.. 144n, 243 
Link relatives, 339 

Literary DigOvst, sampling method of, 
r 10-11, 33 

Logarithmic chart, grid, or pupei . 
logarithmic horizontal scale. 614, 616 
logarithmic horizontal and vortical 
scales, 505 

logarithmic vertical scab, 98-Ur», 245, 
292, 204, 30G. 504 
semi-logarithmic chart, 98-110 
Logaritlunic normal curve, fitting of, iil3* 
619 

Logarithmic probability paper, 615 
Logarithms, common: 
explanation, 776 
table of. 777-7n 
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Logistic curve, 310-316 

as “law” of population growth, 315-316 
comparison with Gompertz, 310-318 
first difference of, 317 
fitcmg of; 

by method of selected points, 310 -315 
by use of reciprocals, 310 
properties of, 310 
series of, 315-316 
skewed, 316 
Long cycles, 253 
Lowen&tem, D., 126 

M 

MacDonald, A,, 455 
Madow, W. G., 28n, 30n 
Mahaianobis, P. C., 763 
Map {see Statistical map) 

Marshall, A., 410 
Marsfeiptil-Edgeworth formula, 410 
Martin, J. IL, 523 
Mathematical proofs, 792-817 
Mather, K.. 721 

Maximum variation chart, 85-87 
McMillan. R T., 446 

Mean {see Arithmetic mean ; Geometric 
mean; Harmonic mean; Quadratic 
mean) 

Mean deviation, 215 

Mean square contingency, coefficient of, 
481-482 
Median: 

definition of, 185 
graphic location: 
frequency curve, 193 
ogive, 187-188 
grouped data, 186-187 
ungrouped data, 185-180 
use in index numbers, 420 
use in seasonal, 326 
Merrington, Maxine, 750, 759 
Mesokurtic distributions, 213, 232, 234, 607 
Miller, A. H., 7l0n 
Miner, J. R., 653n 

Minor means (see Geometric mean; 

Harmonic mean; Quadratic mean) 
Misuses (see Improprieties^ 

Mitchell, W. C.. 250, 389n, 392n 
Mode: 

betas used in computation of, 190n 
definition of, 189 
graphic location: 
column diagram, 191 
frequency curve, 191, 193 
ogive, 191 

grouped data, 190-192 
ungrouped data, 189-190 
Modified exponential curve: 

charts of characteristic shapes, 299 
fitting of, 298-302 

formulas for constants, 302, 799-800 
properties of, 298-299 
Modified mean: 
forms of, 182-183 

use in computing seasonal index, 322- 
323, 335-339 
Modley. R., 126 
Moments: 

correction of for grouping error, 237-239 
when applicable, 238, 62 In 
first moment, 229, 237 


Moments (coni,): 

. fourth momeot* 232-236, 237-^39 
second moment, 231, .237-239 
third moment, 229-232, 237 
Mood. A. M.. 640n, 7191i 
Moore, G. H., 3S9n, 390 
Mo&teller, F., 31 n 
Mouzon, E, D., Jr., 577n 
Moving, averages: 

irregular movements, smoothing of, 380- 
382 

seasonal index, used in computing, 328- 
334 

Moving seasonal, 340-351 
Mudgott, B, D., *|,28n 
Multiple-axis chart, 90 (see also Year- 
over-year chart) 

Multiple correlation: 

and explained variation, 534, 543, 548 
coefficient derived from simple and par- 
tial coefficients, 550, 556 
coefficient derived from simple coeffi- 
cients, 546n, 555-556 
curvilinear, 559-560 
effect of additional variables on, ^534^ 
effect of intercorrelations cni, 545-546 
estimating equations {see Estimating 
equations) 

four or more independent variables, 549 
551 

importance of individual independent 
variables, 557-559 
m variables, 549-551 
meaning of, 531-534 
multiple-partial, 551 

net coefficients of estimation, 533, 542, 
547, 550, 557 
non-hnear, 559-560 

normal equations (see Normal equations 
in correlation) 

population estimate of coefficients, 734 
regarded as simple correlation, 551 
significance tests of coefficients, 732-734 
standard errors of estimate (see Stand- 
ard error of estimate) 
three independent variables, 646-549 
time as an independent variable, j573-575 
two independent v’^ariables, 541-543 
Multiple determination, coefficient of (see 
Determination, coefficient of) 
Multi-stage sample, 29 

N 

Nair, K. R., 31 In 
Nayer, P. P. N., 703 
Net balance chart, 85 
Net correlation (see Partial correlation) 
Newhall, S. M., 216 
N, Y. Times Weekly Index of Business 
Activity, 444-445 
Neyman, J., 704n , 

Non-determination, coefficient of, 403 
Non-linear correlation 
logarithms used. 503-504, 508-512, 518- 
519 

means used, 520-524 
multiple, S59-560 

population estimate of coefficient, 729- 
730, 732 

reciprocals used, 505-506, 519-520 
second-degree curve used, 486-491 
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Non-lihear correlation (coni) : 

signiiicar.ee tests of coepedents, 726-732. 
square roots us^d^ 504-505, 513-516 
third-degree curve used, 493-498 
Normal, in time series, 380 
Normal curve of error (see Normal curve) 
Normal curve or distribution (see also 
Logarithmic normal curve): 
and binomial, 591-594 
and significance tests, 626-642, 663-667, 
670-671, 673-675, 679-680, 681-683. 
684-686, 724-725, 735 
development from laws of chance, 690- 

fitting of : r 

areas, 599-603, 603-606 
ordinates, 696-599 
formula for, 694, 595 
historical development of, 590-591 
relation to chi-square, 1 and F distribu- 
•^ions, 720-721 
table of ordinates, 744-746 
tables of areas, 746, 748, 749 
testing suitability of, 606-607, 690-691 
Normal equations explained, 265-270, 798- 
799 

Normal equations in correlation: 
multiple correlation: 

' four or more independent variables, 
550 

three independent variables, 546-547 
. two independent variables, 542 
non-linear correlation: 
second-degree curve, 486-489 
straight line to logarithms, 508, 518 
straight line to reciprocals, 519 
straight line to square roots, 513-514 
third-degree curve, 494-497 
two-variable linear correlation : 
grouped data, 475n 
ungrouped data, 456-457, 467, 491, 

539 

Normal equations in time series: 
second-degree curve, 285-286 
second-degree curve to logarithms, 296 
straight line, 270-272 
straight line to logarithms, 292 
third-degree curve, 288 
Normal probability curve (see Normal 
curve) 

Null hypothesis, 637 
not proven or diaproven, 637 

O 

Observation equations, 267-270 
Ogive. 168-170. 187*188, 191, 606 
Orthogonal polynomials, 289*290 

F 

Faasche, H., 410 
Paris, J* O., 394n, 426n 
Parity index, 396, 439*441 
Parity ratio, 440*441 
Parkea, A. S*, 610 
Farten, Mildred B., 13ii 
Partial correlation: 

and explained variation, 534*535, 543* 
,545, 649 

and net coefiieient of estimation, 535 


Partial correlation (c&nt ) : 
coefficient derived from lower-order 
coefficients, 552-555 

first-order coefficients, 543-546, 652-654 
four or more independent variables. 560- 
651 

meaning of, 634-536 
population estimate of coefficient, 736 
regarded as simple correlation, 551 
second-order coefficients, 649, 554 
significance tests of coefficients, 734*735 
third or higher-order coefficients, 555 
three independent variables, 649, 554 
time as an independent variable, 573- 
575 

two independent variables, 643*545, 662- 
654 

used in two-variable non-linear correla- 
tion, 493n 

Partial determination, coefficient of (see 
Determination, coefficient of)#** 
Pearl, R., 31 In, 316 

Pearl-Reed curve (see also Logistic curve) , 
310-316 

Pearson, E. S., 661n, 678-679, 688n, 704n, 
721, 741, 745, 747, 753, 759, 764, 766 
Pearson, K., 227, 451, 690n. 622. 741, 745, 
747 

Percentage frequency distribution, 166 -168 
Percentages (see also Proportions; Hates; 
Ratios) : 

averaging of, 151-152, 183-184, 680 
faulty use of, 149-152 
hundred per cent statement, 147-148 
rounding to total 100 per cent, 61*62, 
140 

significance tests, 661*680 
Percentile measure of: 
dispersion, 214 
skewness, 229 
Percentiles, 187*189 
Period data, 71*73 
Periodic curve, 388 

Periodic movements (see also Seasonal 
movements; Seasonal indexes): 
explained, 246-249 

intra-year indexes (see Seasonal indexes) 
types of, 246, 249 
Peters, C* C., 237n, 482n 
Physical volume or business activity, 
indexes of, 442*446 
Pictographs, 123-126 
Pie diagrams, 126-130 ^ 

Pittsburgh business activity, index of, 445* 
446 

Platykurtic distributions, 213, 232, 234* 
236, 722 

Playfair, W., 68n 
Point data, 71*73 
Poisson distribution, 694a 
Polynomial series: 

as estimating equation in correlation; 
second degree, 486-491 
straight line, 465-468, 491*493 
straight line to logarithms, 503-604, 
508-512, 518-519 

straight line to reciprocals, 505-506 
619-520 

straight line to square roots, 504*605, 
613*516 

third degree, 493-498 
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Polynomial series (con^.) : 
as trend in time series: 
fifth degree, 282-283 
fourth degree, 282-283 
second degree, 282-283, 285-288 
second degree to logarithms, 295-297 
straight line, 263-275 
straight line to logarithms, 290-294 
third degree, 282-283, 288-289 
orthogonal, 289-290 

Population, estimate of (see also Confidence 
limits) : 

coefficients of determination: 
multiple, 734 
non-linear, 729-730, 732 
partial, 736 

Wo-variable linear, 725-726 
correlation coefficient (see coefficients of 
determination, above) 
proportion, 680 

stOTdard deviation, 644, 810-812 
variance, 644, 810-812 
Population changes, adjustment for, 256- 
257 

Powers of natural (and odd natural) num- 
bers, sums of, 740-743 
Precision, measure of, 221-222 
Prefatory notes in tables, 61 
Prescott, R. B., 243n, 309n 
Presentation of data (see Bata, statistical, 
presentation of) 

Price changes, adjustment for, 257-258, 394 
Price index numbers (see Aggregative price 
index numbers; Index numbers) 

Price relatives: 
behavior of, 398-400 
contrasted with index numbers, 396-397 
definition of, 414 

used to construct index numbers, 414-417 
Prices paid by and received by farmers, in- 
dexes of, 439-441 
Primary source, 45-49 
Probability paper; 
arithmetic, 607 
logarithmic, 615 
Proofs, mathematical, 792-817 
Proportions, chart, 81-83 
Protractor, percentage, 127, 129 
Punch card, 42, 44 
Purposive sample, 31 

Q 

Quadrants, for curve plotting, 68, 70 
Quadratic mean, 209 
Qualitative distributions, correlation of, 
480-482, 524 

Quality, control of, 30, 643 
Quantity index numbers (see Aggregative 
quantity index numbers; Index 
numbers) 

Quantity relatives, used to construct index 
numbers, 423-424 
Quartile deviation, 216 
Quartile measure of: 
dispersion, 216 
skewness, 229 
Quartiles, 187-189 
Questionnaire, 16 
Quintiles, 187-189 
Quota sample, 31 


E 

Randall, C. K., 439n 
Random point sample? 31 
Random sample, 2^27, 626 
Range, 214 
Range charts, 87 

Ranked data, correlation of, 478-480 
Rates': 
birth 144 
death, 143-144 
use of term, 136n 

Ratio chart (see Semi-logarithp»ic chart) 
Ratio of determination (square c4 correla- 
tion ratic), 622 

Ratios (see also Percentages; Proportions; 
Rates) 
averaging; 

arithmetically, 161-152, 183-184, 680 
arithmetic versus geometric mean, 200- 
201, 418-420 
calculation of, 136-138 
effect of changing base,' 138-139 
faulty use of percentages, 149-162 
illustrations of use, 141-149 
recording percentages, 139-140 
types of, 140-141 
Reciprocals, table of, 766-775 
Reed, I4. J., 315 

Reference-cycle analysis, 388-392 . 
Reference table, 53 
Registration, 16 

Reliability (see Significance tests) 

Reproduction of charts, 81 

Research methods, 12-14 

Reverse J curve, 164 

Rietz, H. L., 61 3n 

Romig, H, G., 666n, 668n 

Ross, F. A., 743 

Ross, J. E., 447n 

Rounding, 139-140, 818-822 

Rugg, H. O., 482n, 745, 746 

Ruling: 

curves, 80-81 
tables, 64 

S 

Sample (see also Significance tests) ; 
as used by or in: 

American Institute of Public Opinion, 
32 

Census of Manufacturing, 25-26 
index numbers, 402-404 
Literary Digest, 10-11, 33 
bias in, 31, 33-34 
test of stability misleading, 33 
types of samples: 
area, 29 
cluster, 28-29 
haphazard, 32-33 
multi-stage, 29 
purposive, 31 
quota, 31 
random, 26-27 
random point, 31 
sequential, 30-31 
stratified, 29-30 
systematic, 27-28 

Sample values, tests of (see Signifilcance 
tests) 

Sasuly, 388n 
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Scale labels^ 84 

Scatter, zones of (sec Sterfdard error of 
estimate) ^ 

Scatter diagram, 451-452, 466-467 

Scatter ratio, 512 

Schedule: 

editing, 36-37 
illustrations, 19-21 
making, 18-25 
meaning of term, 16 
tabulating from. 37-45 
use of, 34-36 

Schumachejr F. X., 457,. 488 
Score shfiiet, 40 

Seasonal indexes (see also ideational move- 
ments) : 

amplitude adjustment, 360-361 
changing, 340-351 
combination types, 362 
constant. 

lint, relative, Persons, or Harvard, 339 
per cent of moving average, 326-339 
per cent of trend, or Faikner, 324-320 
continuity of, 361-362 
Easter adjustment, 352-359 
logical basis, 362-363 
moving, 340-351 
stable (see constant, above) 
sudden changes in, 359 
tests of, 339, 372 

timing, short time shifts in, 359 360 
Seasonal movements (see also Periodic 
movements) : 
adjustment for; 

by division, 367-371 
by subtraction, 372-373 
nature of, 246-248 
reasons for interest in, 248-249 
types of (see also Seasonal indexes^, 246- 
248 

Seasonal variation (see Seasonal move- 
ments) 

Secondary source, 45-49 
Secondary trend, 253 

Second-degree curve (see Polynomial senes) 
Second-order partial correlation coefficients, 
549 , 554-555 
Secrist, H., 4S2n 
Secular trend (see Trend) 

S. E. C. Index of Common Stock Prices, 
441-442 

Selected points, for fitting logistic curve, 
310-315 

Semi-interquartile range, 215 
Semi-logarithmic chart (see also Loga- 
rithmic chart) : 
applications of, 105-113 
construction of scale, lOO- 102, il3 116 
cycles, 100-101, 113 

expansion and contraction of scale, 113- 
114 

explained, 98-102 
interpretation of, 103-105 
principles of construction, 100-102, 118, 
116 

, purpose of, 93, 98 
Semi-tabular presentation, BO-^fil 
Sequential sampling, 30-31 
Sheppard’s corrections, 237-239, 62 In 
Shewhart, W. A., 23Sn, 619a, 621n, 627- 
631. 702, 747 


Significance: 
and value of P, 638-641 
criterion of, 640-641 
level of, 635 

Significance ratio, 637, 645 
Significance tests (see also Confidence 
limits) : 

analysis of variance (see Analysis of 
variance) 

chi-square, 681-693, 699-702 
errors in, 639 
F (see F) 

likelihood, criterion of (see L) 
of difference between observed and com- 
puted frequencies, 679-680, 684-693 
of difference between observed and 
population frequencies, 661-679, 
681-684 

of difference between sample and popu- 
lation values: 
arithmetic means, 635-650 
betas. 720-722 

coefficients of determination, 722-724, 
725-736 

correlation coefficients, 722-724, 725- 
736 

proportions, 661-679, 681-684 
standard deviations, 699-702 
variances, 699-702 

of difference between two sample values: 
arithmetic means, independent sam- 
ples, 651-654 

arithmetic means, non-independent 
samples, 654-657 

coefficients of determination, 724-725 
correlation coefficients, 724-725 
proportions, 679-680, 684-693 
standard deviations, 702-704 
variances, 702-704, 706-720 
^of several variances, 704-706 
of slope of linear estimating equation, 723 
one tail versus two tails, 637-638 
t (see t) 

variance, analysis of (see Analysis of 
variance) 

z (see z transfor^nation) 

SigniBcant digits, 818-822 
Silhouette chart, 85-86 
Simple correlation (see Two-variable linear 
correlation) 

Sine-cosine curve, 388 
Skewed curve: 

fitting of by use of logarithms. 613-619 
fitting of normal curve with adjustment 
for skewness, 619-623 
Skevwiess: 

absolute versus relative, 227 
meaning of, 225 
charts, 212, 226 
measures of relative: 

Pearsonian, 227-229 
using percentiles, 229 
using quartiies, 229 
using third moment, 229-232 
significance test, 720-722 
Smalley, C. W., 249n 
Small-number methods, 657 
Smith, J. G„ 641n 
Snedecor, G. W., 160n 
Solomons, L. M., 228ii 
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Sorter, electric (see Electronic statistical 
machine) 

Source note: 
of chart, H5 
of table, 61 
Sources of data: 

comparability of, 48”49 
priraary, 45 
secondary, 45-46 
selected list of, 823-829 
suitability of, 46-47 
Spear, Mary E., 76n 

Spearman rank correlation coefficient, 478- 
4S0 

Specific-cycle analysis, 392 
Spillman, W. J,, 496 
Square roots, table of, 766-775 
Squares, table of, 766-776 
Stamp, Sir J., 15n, 150n 
St,ia|i berry, Van B., 112n 
Standard deviation: 

and area under normal curve, 219-221, 
746, 748, 749 

correlation when in terms of, 465n, 571- 
573 

grouped data, 217-220 
of population, 217, 633 
of population, estimated value, 217, 644, 
651-652 

of sample, 217, 644 
properties of, 219-222 
ungrouped data, 215-217 
used in comparing cyclical movements, 
384-387 
Standard error: 

of arithmetic mean, 633, 807-810 
of difference between two arithmetic 
means, 651, 812-813, 814 
of difference between two proportions, 
680 

of proportion, 665, 814-816 
of 2 !, 724, 735 

Standard error, estimated: 
of arithmetic mean, 644 
of difference between two arithmetic 
means, 652 

of di0erence betweeyp two proportions, 
680 

Standard error of estimate: 
multiple correlation: 

effect of additional variables on, 543, 
548 

four or more independent variables, 550 
three independent variables, 548 
two independent variables, 534, 543 
non-linear correlation: 
second-degree curve, 490 
straight line to logarithms, 51?-612, 
619 

straight line to reciprocals, 520 
straight line to square roots, 516 
third-degree curve, 498 
two-variable linear correlation: 
grouped data, 477 
ungrouped data, 454, 458-461, 468, 
492, 640 

Standard scores, 224-225, 571 
Statistical data (see Data, statistical) 
Statistical differences versus generic differ- 
ences, 657-658 

Statistical inference (see Significance tests) 


Statistical map: 
dot, 131~h33 
hatched, 131' 
pm, 132-134 

Statistical method, 1~'2, 12 
Statistical reports, tables in, 65-66 
Statistical tables (see Tables, statistical) 
Statistics: 

de^nition of, 1 
origin ’of, 2 
Stauber, B, R., 439n 
Stein, H., 117n 
Stencils for lettering, 84 
Stewart, Leonora, 596 
Storie, R, E., 447n 
Straight-line trend: 

equation explained, 263-265 
least-squares fit: 

adapting equation to monthly data, 
275-278 

even number of years. 273-275 
fitted to logarithms, 290-294 
normal equations, 267-270, 798-799 
observation equations, 267-270 
odd number of years, 270-273 
reason for use of, 265-270 
Stratified sample, 29-30 
Stryker, H. E.. 126n 
Stuart, A,, 693n 
Student (W. C. Gosset), 751 
Summary table, 53-54 
Sums of powers of natural numbers, 740- 
741 

Sums of powers ot odd natural numbers, 
742-743 

Systematic sample, 27-28 

T 


and significance teat for arithmetic means, 
645. 653, 656 

and significance test for correlation coeffi- 
cients, 722-723, 727, 729, 734-735 
and significance test for slope of linear 
estimating equation, 723 
curves of, 646 
distribution of, 646 

relation to normal, chi-square, and F dis- 
tributions, 720-721 
table of values of, 750-761 
Tables, statistical : 

arrangement of entries, 66-60 

comparisons, 64-65 

emphasis, 56 

footnotes, 61 

guiding the eye, 65 

percentages, use of, 61-62 

prefatory notes, 61 

reproduction of, in reports, 66 

rounding numbers, 62-63 

ruling, 64-65 

size and shape, 63-64 

source notes, 61 

title and identification, 60 

totals, 63 

type, size and style, 65 
types of, 53 
type-written, 65-66 
units, 63 

Tabular presentation (sue Tables, statistical) 
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Tabulation ; 

hand sortipg, 39 t 

mechanical, 39-45 ^ * 

score or tally sheet, 37-39 
Tabulator, electric (see Electronic statistical 
machine) 

Tally sheet, 40 
Tchebycheff’s inequality, 221 
Text table, 53-64 ^ 

Third-degree curve (see Polynomial series) 
Thompson, Catherine M., 753, 759 
Thorson, G., 595 

Time element in correlation (see Time series 
correlrtion) 

Time rev«sa! test, 427 
Time series: ^ 

correlation of (see Time series correlation) 
movements in: 

cyclical, 249-251, 373-382, 384-387 
irregular, 251, 382-384 
long cycles. 253 

peritoc, 246-249, Ch. 14, Ch, 16 
trend, secondary, 253 
trend, secular, 240-246, Ch. 12, Ch. 13 
plotting of, 71-73 

preliminary treatment of data, 253-259 
Time series correlation (see also Lag) : 
adjusting for trend and seasonal by use of 
. cyclical relatives, 578-585 
adjusting for trend by use of : 

absolute deviations from trend, 575 
first differences, 575-576 
percentage differences, 575-576 
^ percentages of trend, 563-673 
equivalence of use of absolute deviations 
and partial correlation, 676-576 
problems involved, 576-578 
unadjusted data, 562-563 
use of multiple and partial correlation, 
573-575 

Tippett, L. H. C., 638n, 710n 
Title: 

of chart, 84-86 > 

of table, 60 
Totals in table, 63 
Total variation in: 

analysis of variance, 708-709, 712, 714 
multiple correlation: 

four or more independent variables, 550 
three independent variables, 647 
two independent variables, 634, 643 
non-linear correlation: 
correlation ratio, 621-622 
second-degree curve, 490 
straight line to logarithms, 509-610, 

518 

straight line to reciprocals, 619-520 
straight line to square roots, 614 
third-degree curve, 497 
two-variable linear correlation, 462-464, 
467, 491, 639-640 

Trend: 

adjustment for, 366-367, 378 
empirical test of data, 318-319 
explained, 240-246 
fitting of; 

asymptotic growth curves, 297-318 
Gomperts, 302“310 
inspection trend, 262, 318 
logistic. 310-316 | 

modified exponential. 298-302 j 

polynomials (sec Polynomial series) i 


Trend (coni ,) : 
inter-cycle, 389 
intra-cycle, 389 
nature of, 240-246 
secondary, 253 

secular, 240-246, Ch. 12, Ch, 13 
selection of period, 278-280 
selection of type, 318-319 
Tukey, M. W., 31 
Two-variable linear correlation : 

coefficient of correlation and slope of 
estimating equation, 465-466 
coefiicient of determination: 

and explained variation, 461-464 
and proportions of common factors, 
464n 

concepts, 451-465 
estimating equation, 455-458 
grouped data, 473-478 
normal equations, 455-457 
population estimate of coefficients, 7.^5- 
726 

product-moment formula, 465-466 
qualitative data, 480-482 
ranked data, 478-480 
results compared with: 

multiple correlation, 543, 548 
non-linear correlation, 491-493 
partial correlation, 545, 553-554 
scatter diagram, 451-452, 466-467 
significance tests, 722-726 
standard error of estimate, 458-461 
ungrouped data, 466-469 
Type I and Type II errors, 639. 719a 
Typewriter, use of; 
in chart lettering, 84 
in table construction, 66-66 

U 

Unbiased estimate (see Population 
estimate) 

Unexplained variation in: 
multiple correlation: 
four or more independent variables, 
550 

three independ^t variables, 547 
two independent variables, 534, 643 
non-linear correlation: 
second-degree curve, 490 
straight line to logarithms, 611, 619 
straight line to reciprocals, 620 
straight line to square roots, 514 
third-degree curve, 497 
two-variable linear correlation, 461-462, 
468, 492, 639-640 
Units, how shown in table, 63 
U. S. Bureau of Labor Statistics indexes; 
consumer prices, 394, 436-438 
wholesale commodity prices, 398-400, 
438-439 


V 

Van Voorhis, W. E., 237n, 482a 
Variable: 

continuous and discrete, 161 
independent and dependent, 452, 532 
Variance: 

analysis of (see Analysis of variance) 



INDEX 


843 


Variance (conL)t 

of population, 217, 634 
of population, estimated from: 

coiu^mn means, 709-710, 713-714, 718- 
720 

interaction,- 718-719 
interaction and variation within boxes, 
719-720 
one sample, 644 
residual variation, 713-714 
row means, 713-714, 718-720 
several samples, 65 In 
two samples, 651-652 
variation within boxes or cells, 718-719 
within columns, 709-710 
of sample, 216 

Variance of population, estimated (see 
Variance) 

Variation: 

additive nature of, 461-462, 709 
•and coefficients of determination (see 
Explained variation) 
between column means, 706-708. 712, 
715, 815-816 

between row means, 712—713, 715 
coefficient of, 222-223 
due to interaction, 718 
explained (see Explained variation) 
residual, 713 

total (see Total variation) 
unexplained (see Unexplained variation) 
within boxes or cells, 716-717 
within columns, 708-709, 816 
Varying horizontal-scale charts, 89-90 
Verhulst, P. F„ 316 
Vignec, A. J„ 732a 


W 

Wald, A., son" 

Walker, Helen M., SOn, 68n, 690n, 626ii, 
638ii, 720n 

Washburn, E. S., 523 

Weekly Index of Business Activity, N. Y. 

Times, 444-445 
Weid, h. D.. 590 
West, Helen, 696 

Wholesale Commodity Prices, Index of, 
141-142, 398-400, 438-439 
Wiley, N. C., 645 
Winfrey, R., 236 
Wingfield, A.-'H,, 472 
Winston, Ellen, 447n 
Winston. S., 163, 226, 228 
Working, H.. 207n 

Working days, flexible calendar of, 738-739 

Y 

Yates, F., 27n, 289n, 750, 753. 759 
Yates’ correction, 666-667, 671, 689 
y ear-over-year chart, 90, 369, 372 
Yule. G. U.. 480n, 694n 

Z 

Z chart, 87, 89 

Zero on vertical scale of charts, 76—80 
Zero-order coefficients,. 545 
z transformation, 723-725, 736 



